Decision-Theory on Guy Freeman

Make Your OpenClaw Cheaper and Harder to Fool

Tue, 09 Jun 2026 00:00:00 +0000

OpenClaw makes tool calls all day, and two kinds of them deserve more scrutiny than they get. The first merely costs money: the agent runs a call it has already run — same tool, same arguments, same session — because nothing in its loop is keeping score. The second is worse: a prompt injection in something the agent read persuades it to carry untrusted data into a consequential action, and an address that arrived inside a document it was summarising turns up as the recipient of a forward.

What a Regex Can't Do

Mon, 08 Jun 2026 00:00:00 +0000

In the last post I built a governance layer for a coding agent’s tool calls: a body that hooks the agent’s tool_call event, extracts a few features, and dispatches ask, proceed, or block; and a brain, a Julia daemon that holds a belief and maximises expected utility. The commitment that held it together was that the brain is opaque to the body. The wire carries observations and named actions and nothing else, so the brain can change how it reasons without the body ever knowing.

The Brain is Opaque to the Body

Tue, 05 May 2026 00:00:00 +0000

There is a coding agent — pi, the AI tool I use to write code — and it makes tool calls all day. It runs bash, edits files, opens HTTP requests, queries databases. Most of what it wants to do is fine. Some of it isn’t. The question I have been circling for months is: who decides which is which, and how, and on what basis?

The current answers are unsatisfying. The agent’s own RLHF disposition decides — but I have no visibility into the function it’s optimising. A static rules file decides — but a static file cannot improve from observation. A human (me) decides — but a human cannot stay in the loop on every tool call without becoming the bottleneck. None of these are learning about the agent’s behaviour from the agent’s behaviour. None of them treat the question as what it actually is: a sequential decision problem under uncertainty about the agent’s intent and its capacity to do harm.