<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reinforcement-Learning on Guy Freeman</title><link>https://gfrm.in/categories/reinforcement-learning/</link><description>Recent content in Reinforcement-Learning on Guy Freeman</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 24 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://gfrm.in/categories/reinforcement-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>The Loop Problem</title><link>https://gfrm.in/posts/loop-problem/</link><pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate><guid>https://gfrm.in/posts/loop-problem/</guid><description>&lt;div class="callout callout-note"&gt;
 This is Part 3 of a series. For the axiomatic foundation, see &lt;a href="https://gfrm.in/posts/three-types/"&gt;Part 1: Three Types and a Funeral&lt;/a&gt;. For the VOI-gated text adventure agent, see &lt;a href="https://gfrm.in/posts/teaching-zork/"&gt;Part 2: Teaching Zork to a Bayesian&lt;/a&gt;.
&lt;/div&gt;

&lt;p&gt;Every reinforcement learning agent that has ever played a text adventure has, at some point, tried to take the lantern fifty times in a row.&lt;/p&gt;
&lt;p&gt;Not because it&amp;rsquo;s stupid. Because its state representation makes &amp;ldquo;Shack with book&amp;rdquo; and &amp;ldquo;Shack with lantern&amp;rdquo; look like different states, so the learned futility of &amp;ldquo;take lantern&amp;rdquo; in one state doesn&amp;rsquo;t transfer to the other. The agent is doing exactly what its architecture tells it to do: each state-action pair is independent, and it hasn&amp;rsquo;t yet learned that &lt;em&gt;this particular&lt;/em&gt; pair is useless. It will learn, eventually, after wasting 39 steps per episode on actions it has already tried.&lt;/p&gt;</description></item></channel></rss>