Make Your OpenClaw Cheaper and Harder to Fool
A governor that learns your agent, blocks the tool calls it wastes, and asks you first about the ones that smell of injection. Two commands to install.
OpenClaw makes tool calls all day, and two kinds of them deserve more scrutiny than they get. The first merely costs money: the agent runs a call it has already run — same tool, same arguments, same session — because nothing in its loop is keeping score. The second is worse: a prompt injection in something the agent read persuades it to carry untrusted data into a consequential action, and an address that arrived inside a document it was summarising turns up as the recipient of a forward.
I built a governor for OpenClaw that addresses both. credence-pi is a plugin that watches the tool_call hook, plus a local daemon that holds a Bayesian belief about your agent’s behaviour — learned from your approvals and refusals, updated continuously — and decides, call by call, between ask, proceed, and block by maximising expected utility.
The numbers, measured on real OpenClaw sessions rather than demos built to be caught:
- Waste: exact-repeat tool calls blocked at precision 1.0 and recall 1.0 on held-out sessions — 0.7% of all calls, and nothing else touched.
- Injection: an injected exfiltration surfaced to you as a confirmation at 0.94 precision, while interrupting 1.2% of safe sessions.
Installation is two commands — the daemon, then the plugin:
# the brain (Docker; or from source — see the repo)
docker run -p 8787:8787 -v ~/.credence-pi:/root/.credence-pi ghcr.io/gfrmin/credence-pi-daemon
# the body
openclaw plugins install @gfrmin/credence-pi-openclaw
openclaw plugins enable credence-pi
Everything runs locally. The daemon keeps an append-only log of every observation and decision on your machine, and no raw data leaves it.
Now the label, because a guardrail sold as complete is being sold dishonestly. Waste-blocking is enforced; it is the part that is proven. Safety ships in confirm mode: when the harm term wants to stop an action, you are asked rather than anything being blocked silently — and each yes and no is precisely the signal that turns a belief seeded from a benchmark into a belief about your work. What it cannot do: it lives at the tool boundary, so it is structurally blind to harmful output — bad advice, fabrication — and the harm it can see there tops out at about three in ten of unsafe trajectories on the benchmark. It is research-stage, and whether it is a net improvement to your task outcomes is exactly the question your usage would answer.
If you want to know how it works: The Brain is Opaque to the Body covers the architecture — a body that senses and acts, a brain that reasons, and a wire between them that never moves — and the discipline that kept a coding agent from quietly wrecking it. What a Regex Can’t Do covers what the brain learned, and why matching its behaviour with rules ends in re-deriving Bayesian decision theory.
The code, the eval harness, and the red-team of the claims are in the repository. If you try it, what I most want to know is whether the confirmations land on real threats or merely annoy you on legitimate work — an issue with either answer is the telemetry that turns research-stage into calibrated.