Essays

The Gate That Said No

Whether to adopt a tool you built is itself a decision under uncertainty, so it deserves a decision rule, not a leaderboard. I built a pre-committed, frozen-blind adoption gate over a posterior on the utility difference between two ways of answering questions about my own documents: a model that always answers, and Bayesian machinery that abstains when it is not sure. The gate refused adoption despite a clearly positive mean, because the choice hinges on a number I cannot honestly introspect: how much a confident wrong answer actually costs me. Then I built the abstaining machine and ran it, and it answered almost nothing, withholding even the values it had read faithfully. Not a failure of the decision math but its honest consequence. The silence pulled apart into three separable beliefs (can I find the fact, did I read it faithfully, is it still true), only one of which is my utility talking, and a wrong answer turned out to be, more often than not, a faithful read of a fact that has since gone stale. The gate moves only as the missing number is earned from behaviour, never from a hand on the prior.

essaysbayesianaidecision-theory

Make Your OpenClaw Agent Cheaper, and Measure It Yourself

credence-pi is an OpenClaw plugin plus a local daemon that learns your agent and acts at two points by expected utility: it routes each turn to the cheapest model whose expected accuracy justifies its cost, and it governs each tool call your agent proposes, blocking the wasted ones and flagging injected exfiltration as a confirmation. Routing is on by default; shadow mode reports what it would save and block on your own sessions, with its own false-block rate, before it enforces anything. Early-stage research that wants the community's help; local-first, installable today.

essaysbayesianaidecision-theory

Make Your OpenClaw Cheaper and Harder to Fool

credence-pi is an OpenClaw plugin plus a local daemon that learns your agent's behaviour and governs its tool calls by expected utility: it blocks the calls your agent wastes, flags injected exfiltration as a confirmation at 0.94 precision on a public benchmark, and makes ask-or-proceed decisions a fixed rule provably can't reproduce. Research-stage, local-first, installable today.

essaysbayesianaidecision-theory

What a Regex Can't Do

The governor from 'The Brain is Opaque to the Body' now ships as an OpenClaw plugin: wasted tool calls blocked at precision and recall 1.0 on real sessions, injected exfiltration surfaced as a confirmation at 0.94 precision. And an argument: several of these behaviours are beyond any hand-tuned heuristic, because matching them re-derives Bayesian decision theory.

essaysbayesianaidecision-theory

The Brain is Opaque to the Body

A first-pass body-brain architecture for governing a coding agent's tool_call hook with a Bayesian decision-theoretic brain. The wire schema is fixed by what Pass 1 ships; Pass 2 swaps the posterior representation without disturbing it. Plus what credence-lint and a precedent system caught at the eleventh hour.

essaysbayesianaidecision-theory

Three Types and a Funeral for Your Inference Library

What would it take to build an agent whose behaviour is derived from a few fundamentals the way physics is derived from conservation laws? Three types, four axioms, and a refusal to add anything else.

juliabayesianmachine-learningaiessays

Keeping the Coding Agent on the Straight and Narrow

A companion to the PKM Phase 1 post. The foundation was built by two AIs — Claude.ai for design, Claude Code for implementation — with a spec as the contract between them. Ten SPEC revisions in four days, and what the rules caught that 'pragmatic' would have missed.

essaysaipython

A Content-Addressed Foundation for Personal Knowledge

Eleven million words of personal documents, four canonical questions none of Khoj, Paperless-ngx, Obsidian, or Karpathy's LLM Wiki can answer, and a content-addressed extraction foundation that takes content-addressing seriously. Phase 1 of a multi-phase build.

essaysdataaipython

Ninety-Six Percent Cheaper and Slightly Better

Credence-proxy sits between an agent and its LLM providers, learns which model is good for which category, and routes accordingly. On an OpenClaw benchmark it cut cost by 96% and latency by 52% while raising quality by 1.24 points. The mechanism is one equation.

essaysbayesianmachine-learningai

The Prompting Gradient

Each prompting technique helps. Reasoning traces, strategy guidance, cross-question history --- each one improves accuracy and score. None of them closes the gap with a Bayesian agent that does not use language at all. The ceiling exists because descriptions of calculations are not calculations.

essaysbayesianmachine-learningai

The Agent That Invents Its Own Rules

Most agents are given a fixed set of decision rules. Credence's second tier generates candidate rules from sensor features, scores them by complexity, and lets the posterior decide which structures are worth keeping. This is program synthesis as Bayesian inference.

juliaessaysbayesianmachine-learningai

Sixty-Two Percent Correct and Winning by a Hundred and Twenty Points

A Bayesian decision-theoretic agent scores lower on accuracy than every LLM variant it competes against --- and beats the best of them by 120 points. The explanation requires thinking about something that LLM benchmarks typically refuse to think about.

essaysbayesianmachine-learningai

The Bitter Lesson Has No Utility Function

I wrote about decision theory fading from AI. Hacker News said I was annoyed at Rich Sutton's Bitter Lesson. I wasn't. But the misreading proves the point.

essaysbayesianmachine-learningai
Why We Stopped Using the Mathematics That Works

Why We Stopped Using the Mathematics That Works

Someone asked why decision theory stopped being widely used in AI. The answer involves ImageNet, academic departments, and the seductive power of not having to specify your objectives.

essaysbayesianmachine-learningai
Agentic AI Is Neither Intelligent Nor an Agent

Agentic AI Is Neither Intelligent Nor an Agent

I built a Bayesian agent and set it against LangChain on a tool-use benchmark. LangChain got more answers right and still lost — by 120 points.

pythonbayesianmachine-learningaiessays
On Owning Your Data

On Owning Your Data

Why I reverse-engineered a cheap Bluetooth scale to keep my health data out of the cloud

essaysprivacydata