This is related to the exploration–exploitation tradeoff studied in reinforcemen...

chrisweekly · 2025-07-23T21:36:17 1753306577

I learned about "explore vs exploit" in a fantastic book titled "Algorithms to Live By - What computer science can teach is about human decision-making" [subtitle from memory]

Highly recommended!

godelski · 2025-07-23T23:31:10 1753313470

  >  the exploration–exploitation tradeoff ... for which no universally optimal solution is known

I want to nitpick a bit. It may be nuanced, but I think it makes a significant difference.

Exploration-exploitation is about a class of algorithms[0], and I would not say that means "there is no optimal solution." Rather, it is often applied to problems where there is no optimal solution. You can apply it to problems with known optimal solutions and it should still get you to the optima.

What problems don't have global optima? Most. Most problems we simplify in ways that will contain an optima, but it is best to remember the assumptions made and if they appropriately apply. This is part of why always maintaining at least some exploration can be a highly successful strategy. Extremely useful in the real world too as the environment is always changing. You cannot generate globally optimal solutions for dynamic problems where the future states are unknown.

[0] Examples include Q-Learning, Bandits, or sampling algorithms. Multiarmed bandits and Thompson Sampling are mentioned in the wiki but note that there are more bandit algorithms and more sampling algorithms.

smokel · 2025-07-24T12:16:15 1753359375

Agree. I think it safe to say that we haven't found an optimal solution for the problem discussed in the article.

It's not even clear what the reward function should look like, so perhaps the comparison with reinforcement learning is not a productive one. Life combines both evolution and learning, and it's all rather complicated to say something sensible about it.