Essays on Guy Freeman

Three Types and a Funeral for Your Inference Library

Sun, 26 Apr 2026 00:00:00 +0000

This is Part 1 of a series on Bayesian decision-theoretic agents.

This post describes the Credence architecture as it stood in March 2026, when the system used the standard Kolmogorov definition of probability — measures over sample spaces. Since then, the foundation has been reconstructed around de Finetti’s definition, where expectation (the prevision) is the primitive and probability is derived from it. The three types described here were the right starting point; what came next is at the end.

What would it take to build an agent that genuinely learns and decides — not one that pattern-matches its way through tool calls, but one whose behaviour is derived from a few fundamentals the way physics is derived from conservation laws?

A Content-Addressed Foundation for Personal Knowledge

Wed, 22 Apr 2026 00:00:00 +0000

I have about eleven million words of personal documents. Contracts, invoices, court filings, medical notes, research papers, travel itineraries, conversation transcripts, CVs of various vintages, takeaway menus from restaurants that have since closed. A decade of Syncthing directories and Dropbox archives and email attachments saved twice because I wasn’t sure which copy was authoritative.

I would like to ask questions about this corpus. Not vague questions — specific ones:

When did I last see my doctor?
What did Velotix pay me in September 2024?
Which of my subscriptions auto-renew in the next sixty days?
What’s the warranty status on the water heater?

Each of these fails on a different existing tool, for a different reason.

Keeping the Coding Agent on the Straight and Narrow

Wed, 22 Apr 2026 00:00:00 +0000

“Help me keep that coding agent pup on the straight and narrow, as it likes to be ‘pragmatic’.”

That was the instruction I gave Claude.ai a week before any code was written for a personal-knowledge-management project. Its companion describes what that foundation is and why none of the existing tools fit. This one is about how the foundation got built: two AIs, one spec as the contract between them, and what it takes to keep a coding agent from quietly corrupting your architecture.

Ninety-Six Percent Cheaper and Slightly Better

Thu, 30 Apr 2026 00:00:00 +0000

The production question about LLM agents, once you have gotten past whether they work at all, is how much they cost. A customer-service agent that answers well but costs eight cents per turn is not a customer-service agent; it is a charity. The conventional answer is to pick a cheaper model and hope it is good enough. The less conventional answer is to treat model selection as a decision problem.

Sixty-Two Percent Correct and Winning by a Hundred and Twenty Points

Thu, 30 Apr 2026 00:00:00 +0000

The standard way to evaluate a question-answering system is to measure how often it gets the right answer. This seems reasonable. It is, in practice, a trap.

I ran an experiment to demonstrate why. A Bayesian decision-theoretic agent — built on the Credence DSL, using Beta-Bernoulli reliability tracking and value-of-information calculations — competed against several LLM agents on a 50-question benchmark. All had access to the same four tools. All faced the same questions. All were scored on the same objective.

The Agent That Invents Its Own Rules

Tue, 28 Apr 2026 00:00:00 +0000

The previous post in this series described what I called Tier 1 of the Credence architecture: a DSL for Bayesian decision agents with three types, four axioms, and a constitution forbidding everything else. That post ended with a program the user had to write by hand — a short S-expression encoding a hypothesis about what the environment was like and how to act in it.

Hand-written programs have a well-known limitation: they are only as good as whoever wrote them.

The Prompting Gradient

Thu, 30 Apr 2026 00:00:00 +0000

The accuracy paradox post reported the headline: a Bayesian agent scoring +129.5 against an LLM agent’s +10.8, despite lower accuracy. This post is about the LLM side of that experiment — what was tried, what helped, and where the ceiling is.

The Three Variants

Three LLM agents were tested on the same 50-question benchmark. They differed only in prompting:

LLM Bare. The model receives a description of the four available tools, the scoring system (+10 correct, -5 wrong, 0 abstain, minus tool costs), and the current question with its four candidate answers. No guidance on how to decide. No reasoning format imposed. The model chooses a tool, receives a response, and decides what to do next.

The Bitter Lesson Has No Utility Function

Thu, 12 Mar 2026 00:00:00 +0000

I wrote an essay arguing that decision theory had been quietly abandoned by mainstream AI — not because it stopped working, but because deep learning absorbed all the oxygen. I posted it to Hacker News. A commenter informed me I was “annoyed at the Bitter Lesson.”

I hadn’t read the Bitter Lesson. This proved awkward for approximately forty-five seconds, after which it proved illuminating.

So I read it. Rich Sutton’s essay, published in 2019, argues that general methods leveraging computation consistently beat methods built on hand-crafted human knowledge. Chess: deep search beat hand-tuned evaluation. Go: self-play beat human strategy. Speech recognition: statistical methods beat phoneme engineering. Computer vision: neural networks beat edge detectors. The pattern, he argues, has held for seventy years:

Why We Stopped Using the Mathematics That Works

Mon, 09 Mar 2026 00:00:00 +0000

Someone asked a good question. I’d written a post arguing that what the industry calls “AI agents” are flowcharts with good marketing, and that the mathematics to do better has existed since the 1960s. A commenter on LinkedIn replied: “So why did it stop being widely used?”

I sat with this for a day. It deserved a proper answer, not least because I’d spent a decade watching it happen from inside a statistics department and had never quite articulated the mechanism to myself.

Agentic AI Is Neither Intelligent Nor an Agent

Mon, 23 Feb 2026 00:00:00 +0000

I’ve spent the last few months building agents that maintain actual beliefs and update them from evidence — first a Bayesian learner that teaches itself which foods are safe, then an evolutionary system that discovers its own cognitive architecture. The experience has given me a certain clarity about the industry’s use of the word “agent,” in much the same way that learning to cook gives you clarity about airline food.

What would it take for an AI system to genuinely deserve the word?

On Owning Your Data

Wed, 31 Dec 2025 00:00:00 +0000

I step on a bathroom scale and my body fat percentage gets beamed to a server in Shenzhen. This is the arrangement. In exchange for this intimacy, the app suggests I upgrade to premium, which I find touching in a way the developers probably didn’t intend.

I don’t have a problem with companies making money — I have a problem with the default. The default is: your body’s measurements belong to someone else’s database, and you’re welcome to look at them through their app, on their terms, until they pivot to a different business model or get acqui-hired and shut down the API. The data isn’t especially sensitive on its own. Nobody is going to blackmail me with my impedance readings. But if I can’t own the numbers that describe my own physical form, what exactly can I own? It’s a question worth asking, even if the answer turns out to be “not much, but you should try anyway.”