The Loop Problem

Tue, 28 Apr 2026 00:00:00 +0000

This is Part 3 of a series. For the axiomatic foundation, see Part 1: Three Types and a Funeral. For the VOI-gated text adventure agent, see Part 2: Teaching Zork to a Bayesian.

Every reinforcement learning agent that has ever played a text adventure has, at some point, tried to take the lantern fifty times in a row.

Not because it’s stupid. Because its state representation makes “Shack with book” and “Shack with lantern” look like different states, so the learned futility of “take lantern” in one state doesn’t transfer to the other. The agent is doing exactly what its architecture tells it to do: each state-action pair is independent, and it hasn’t yet learned that this particular pair is useless. It will learn, eventually, after wasting 39 steps per episode on actions it has already tried.

Reinforcement-Learning on Guy Freeman

The Loop Problem