The Loop Problem
Every RL agent that has played a text adventure has tried to take the lantern fifty times in a row. The fix is not better exploration heuristics. The fix is representing state properly.
juliabayesianmachine-learningaireinforcement-learning