<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Decision-Theory on Guy Freeman</title><link>https://gfrm.in/categories/decision-theory/</link><description>Recent content in Decision-Theory on Guy Freeman</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 09 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://gfrm.in/categories/decision-theory/index.xml" rel="self" type="application/rss+xml"/><item><title>Make Your OpenClaw Cheaper and Harder to Fool</title><link>https://gfrm.in/posts/openclaw-cheaper-and-harder-to-fool/</link><pubDate>Tue, 09 Jun 2026 00:00:00 +0000</pubDate><guid>https://gfrm.in/posts/openclaw-cheaper-and-harder-to-fool/</guid><description>&lt;p&gt;&lt;a href="https://github.com/openclaw/openclaw"&gt;OpenClaw&lt;/a&gt; makes tool calls all day, and two kinds of them deserve more scrutiny than they get. The first merely costs money: the agent runs a call it has already run — same tool, same arguments, same session — because nothing in its loop is keeping score. The second is worse: a prompt injection in something the agent read persuades it to carry untrusted data into a consequential action, and an address that arrived inside a document it was summarising turns up as the recipient of a forward.&lt;/p&gt;</description></item><item><title>What a Regex Can't Do</title><link>https://gfrm.in/posts/credence-pi-pass-2/</link><pubDate>Mon, 08 Jun 2026 00:00:00 +0000</pubDate><guid>https://gfrm.in/posts/credence-pi-pass-2/</guid><description>&lt;p&gt;In &lt;a href="https://gfrm.in/posts/credence-pi-pass-1/"&gt;the last post&lt;/a&gt; I built a governance layer for a coding agent&amp;rsquo;s tool calls: a &lt;em&gt;body&lt;/em&gt; that hooks the agent&amp;rsquo;s &lt;code&gt;tool_call&lt;/code&gt; event, extracts a few features, and dispatches ask, proceed, or block; and a &lt;em&gt;brain&lt;/em&gt;, a Julia daemon that holds a belief and maximises expected utility. The commitment that held it together was that the brain is opaque to the body. The wire carries observations and named actions and nothing else, so the brain can change how it reasons without the body ever knowing.&lt;/p&gt;</description></item><item><title>The Brain is Opaque to the Body</title><link>https://gfrm.in/posts/credence-pi-pass-1/</link><pubDate>Tue, 05 May 2026 00:00:00 +0000</pubDate><guid>https://gfrm.in/posts/credence-pi-pass-1/</guid><description>&lt;p&gt;There is a coding agent — &lt;em&gt;pi&lt;/em&gt;, the AI tool I use to write code — and it makes tool calls all day. It runs &lt;code&gt;bash&lt;/code&gt;, edits files, opens HTTP requests, queries databases. Most of what it wants to do is fine. Some of it isn&amp;rsquo;t. The question I have been circling for months is: who decides which is which, and how, and on what basis?&lt;/p&gt;
&lt;p&gt;The current answers are unsatisfying. The agent&amp;rsquo;s own RLHF disposition decides — but I have no visibility into the function it&amp;rsquo;s optimising. A static rules file decides — but a static file cannot improve from observation. A human (me) decides — but a human cannot stay in the loop on every tool call without becoming the bottleneck. None of these are &lt;em&gt;learning&lt;/em&gt; about the agent&amp;rsquo;s behaviour from the agent&amp;rsquo;s behaviour. None of them treat the question as what it actually is: a sequential decision problem under uncertainty about the agent&amp;rsquo;s intent and its capacity to do harm.&lt;/p&gt;</description></item></channel></rss>