Bayesian on Guy Freeman

Three Types and a Funeral for Your Inference Library

Sun, 26 Apr 2026 00:00:00 +0000

This is Part 1 of a series on Bayesian decision-theoretic agents.

This post describes the Credence architecture as it stood in March 2026, when the system used the standard Kolmogorov definition of probability — measures over sample spaces. Since then, the foundation has been reconstructed around de Finetti’s definition, where expectation (the prevision) is the primitive and probability is derived from it. The three types described here were the right starting point; what came next is at the end.

What would it take to build an agent that genuinely learns and decides — not one that pattern-matches its way through tool calls, but one whose behaviour is derived from a few fundamentals the way physics is derived from conservation laws?

Ninety-Six Percent Cheaper and Slightly Better

Thu, 30 Apr 2026 00:00:00 +0000

The production question about LLM agents, once you have gotten past whether they work at all, is how much they cost. A customer-service agent that answers well but costs eight cents per turn is not a customer-service agent; it is a charity. The conventional answer is to pick a cheaper model and hope it is good enough. The less conventional answer is to treat model selection as a decision problem.

Sixty-Two Percent Correct and Winning by a Hundred and Twenty Points

Thu, 30 Apr 2026 00:00:00 +0000

The standard way to evaluate a question-answering system is to measure how often it gets the right answer. This seems reasonable. It is, in practice, a trap.

I ran an experiment to demonstrate why. A Bayesian decision-theoretic agent — built on the Credence DSL, using Beta-Bernoulli reliability tracking and value-of-information calculations — competed against several LLM agents on a 50-question benchmark. All had access to the same four tools. All faced the same questions. All were scored on the same objective.

The Agent That Invents Its Own Rules

Tue, 28 Apr 2026 00:00:00 +0000

The previous post in this series described what I called Tier 1 of the Credence architecture: a DSL for Bayesian decision agents with three types, four axioms, and a constitution forbidding everything else. That post ended with a program the user had to write by hand — a short S-expression encoding a hypothesis about what the environment was like and how to act in it.

Hand-written programs have a well-known limitation: they are only as good as whoever wrote them.

The Prompting Gradient

Thu, 30 Apr 2026 00:00:00 +0000

The accuracy paradox post reported the headline: a Bayesian agent scoring +129.5 against an LLM agent’s +10.8, despite lower accuracy. This post is about the LLM side of that experiment — what was tried, what helped, and where the ceiling is.

The Three Variants

Three LLM agents were tested on the same 50-question benchmark. They differed only in prompting:

LLM Bare. The model receives a description of the four available tools, the scoring system (+10 correct, -5 wrong, 0 abstain, minus tool costs), and the current question with its four candidate answers. No guidance on how to decide. No reasoning format imposed. The model chooses a tool, receives a response, and decides what to do next.

The Loop Problem

Tue, 28 Apr 2026 00:00:00 +0000

This is Part 3 of a series. For the axiomatic foundation, see Part 1: Three Types and a Funeral. For the VOI-gated text adventure agent, see Part 2: Teaching Zork to a Bayesian.

Every reinforcement learning agent that has ever played a text adventure has, at some point, tried to take the lantern fifty times in a row.

Not because it’s stupid. Because its state representation makes “Shack with book” and “Shack with lantern” look like different states, so the learned futility of “take lantern” in one state doesn’t transfer to the other. The agent is doing exactly what its architecture tells it to do: each state-action pair is independent, and it hasn’t yet learned that this particular pair is useless. It will learn, eventually, after wasting 39 steps per episode on actions it has already tried.

Teaching Zork to a Bayesian

Tue, 28 Apr 2026 00:00:00 +0000

This is Part 2 of a series. For the axioms and types underneath, see Part 1: Three Types and a Funeral. For the state-representation consequences, see Part 3: The Loop Problem.

Every AI agent demo involves web search, retrieval, or API calls — tasks where querying everything is merely expensive. A LangChain ReAct agent that hammers all four tools on every question wastes money but still gets answers. The penalty is economic, not existential.

The Bitter Lesson Has No Utility Function

Thu, 12 Mar 2026 00:00:00 +0000

I wrote an essay arguing that decision theory had been quietly abandoned by mainstream AI — not because it stopped working, but because deep learning absorbed all the oxygen. I posted it to Hacker News. A commenter informed me I was “annoyed at the Bitter Lesson.”

I hadn’t read the Bitter Lesson. This proved awkward for approximately forty-five seconds, after which it proved illuminating.

So I read it. Rich Sutton’s essay, published in 2019, argues that general methods leveraging computation consistently beat methods built on hand-crafted human knowledge. Chess: deep search beat hand-tuned evaluation. Go: self-play beat human strategy. Speech recognition: statistical methods beat phoneme engineering. Computer vision: neural networks beat edge detectors. The pattern, he argues, has held for seventy years:

Why We Stopped Using the Mathematics That Works

Mon, 09 Mar 2026 00:00:00 +0000

Someone asked a good question. I’d written a post arguing that what the industry calls “AI agents” are flowcharts with good marketing, and that the mathematics to do better has existed since the 1960s. A commenter on LinkedIn replied: “So why did it stop being widely used?”

I sat with this for a day. It deserved a proper answer, not least because I’d spent a decade watching it happen from inside a statistics department and had never quite articulated the mechanism to myself.

Agentic AI Is Neither Intelligent Nor an Agent

Mon, 23 Feb 2026 00:00:00 +0000

I’ve spent the last few months building agents that maintain actual beliefs and update them from evidence — first a Bayesian learner that teaches itself which foods are safe, then an evolutionary system that discovers its own cognitive architecture. The experience has given me a certain clarity about the industry’s use of the word “agent,” in much the same way that learning to cook gives you clarity about airline food.

What would it take for an AI system to genuinely deserve the word?

How Decision Theory Cuts Your AI Agent's API Bill in Half

Mon, 23 Feb 2026 00:00:00 +0000

The companion essay argued that LLM-based “agents” don’t earn the title. No beliefs, no uncertainty quantification, no principled mechanism for deciding whether a tool query justifies its cost. This post supplies the technical scaffolding for that claim — the mathematics and code behind Credence, the benchmark I built to test it. Think of it as the receipts.

For the philosophical argument, see Agentic AI Is Neither Intelligent Nor an Agent.

The Problem: Every Query Has a Price

Hand a standard LangChain ReAct agent a question and four tools, and it will query most of them most of the time. It possesses no apparatus for reasoning about whether the next query repays its cost. The prompt says “be helpful”; the agent takes helpfulness to mean exhaustiveness.

Evolution Discovers How to Think: A Philosophical Journey in Code

Sat, 31 Jan 2026 00:00:00 +0000

In Part 1, I built an agent that learns which foods are safe through Bayesian inference. It starts ignorant, observes outcomes, updates its beliefs using exact conjugate mathematics, and eventually acts with something resembling competence. Clean code, sound theory, and those belief distributions converging in real-time remain genuinely satisfying to watch, in the way that all correctly implemented mathematics is satisfying to watch.

Something has been nagging at me, though.

The agent learns what to believe. I designed how it believes. I chose the variables it perceives. I specified the structure of its world-model. I set the prior hyperparameters. The agent’s entire cognitive architecture — the shape of its epistemic machinery — came from me, handed down like tablets from a mountain. The agent had no say in the matter. As someone who spent years doing Bayesian statistics, I should know better than to treat the model structure as given. The prior over model structures is the prior that actually matters, and I skipped it entirely.

Building a Bayesian Learning Agent That Teaches Itself to Eat

Fri, 26 Dec 2025 00:00:00 +0000

You’re stranded somewhere unfamiliar with twelve types of food scattered around. Some provide energy. Others are toxic. You don’t know which is which, you’re losing energy with every step, and nobody left a manual. The question is whether you can learn fast enough to survive.

This is the exploration-exploitation tradeoff, and it’s one of those problems that sounds like a thought experiment until you actually have to solve it. Pure exploration — trying everything at random — kills you. Pure exploitation — eating only what you currently believe is best — starves you when better options exist two metres away. You need something that balances both, and ideally something with a mathematical proof attached.