Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is related to the exploration–exploitation tradeoff studied in reinforcement learning [1], for which no universally optimal solution is known. This article suggests to explore more, but it might be wise to exploit your knowledge and be somewhat risk-averse every now and then as well.

Also, there is a difference between making decisions at the individual level and looking at the actions of all humans combined. A strategy that is sub-optimal for most individuals can still yield positive outcomes at the societal level. For example, it is fortunate that many people go into research, even though it is highly unlikely for an individual to find a massive breakthrough.

[1] https://en.wikipedia.org/wiki/Exploration%E2%80%93exploitati...



I learned about "explore vs exploit" in a fantastic book titled "Algorithms to Live By - What computer science can teach is about human decision-making" [subtitle from memory]

Highly recommended!


  >  the exploration–exploitation tradeoff ... for which no universally optimal solution is known
I want to nitpick a bit. It may be nuanced, but I think it makes a significant difference.

Exploration-exploitation is about a class of algorithms[0], and I would not say that means "there is no optimal solution." Rather, it is often applied to problems where there is no optimal solution. You can apply it to problems with known optimal solutions and it should still get you to the optima.

What problems don't have global optima? Most. Most problems we simplify in ways that will contain an optima, but it is best to remember the assumptions made and if they appropriately apply. This is part of why always maintaining at least some exploration can be a highly successful strategy. Extremely useful in the real world too as the environment is always changing. You cannot generate globally optimal solutions for dynamic problems where the future states are unknown.

[0] Examples include Q-Learning, Bandits, or sampling algorithms. Multiarmed bandits and Thompson Sampling are mentioned in the wiki but note that there are more bandit algorithms and more sampling algorithms.


Agree. I think it safe to say that we haven't found an optimal solution for the problem discussed in the article.

It's not even clear what the reward function should look like, so perhaps the comparison with reinforcement learning is not a productive one. Life combines both evolution and learning, and it's all rather complicated to say something sensible about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: