Evolution Discovers How to Think: A Philosophical Journey in Code

What if the cognitive architecture itself could evolve?

python
bayesian
evolution
simulation
philosophy
Part 2 of the Bayesian agent series. We confront the question of what should be designed versus what should be allowed to emerge, and discover that it’s agents all the way up and all the way down.
Author

Guy Freeman

Published

January 31, 2026

In Part 1, I built an agent that learns which foods are safe through Bayesian inference. It starts ignorant, observes outcomes, updates its beliefs using exact conjugate mathematics, and eventually acts with something resembling competence. The code is clean, the theory is sound, and watching those belief distributions converge in real-time remains genuinely satisfying.

But something has been nagging at me.

The agent learns what to believe, but I designed how it believes. I chose which variables it perceives. I specified the structure of its world-model. I set the prior hyperparameters. The agent’s cognitive architecture — the very shape of its epistemic machinery — came from me, not from any process of discovery. And this feels like cheating, or at least like stopping short of something more interesting.

What if the agent’s theory of the world is wrong? What if reality’s causal structure involves temperature and texture, but I’ve given the agent sensors only for shape and colour? It would dutifully learn the best possible beliefs given its flawed framework, never suspecting that it’s missing the point entirely. The map would be internally consistent but catastrophically divorced from the territory.

This is the difference between learning within a framework and learning which framework to use. Part 1 addressed the former. This post is my attempt to think through the latter — and to see whether code can help me think.

The Uncomfortable Question

Here is the question that started this whole endeavour: what should be designed by me, and what should be allowed to evolve?

It sounds like a technical question about where to draw boundaries in a simulation, but it’s really a philosophical question about the nature of minds and the origins of cognitive structure. When we look at biological organisms, we see creatures whose perceptual systems, representational capacities, and inferential tendencies were shaped by millions of years of natural selection. A frog’s visual system is exquisitely tuned to detect small moving objects — not because some designer thought flies were important, but because ancestral frogs with fly-detecting visual systems outreproduced those without. The frog doesn’t learn to see flies. It learns where the flies are, using machinery that evolution provided.

This suggests a two-timescale picture of adaptation. Within a lifetime, organisms learn: they update beliefs, acquire skills, form memories. Across generations, populations evolve: body plans change, neural architectures shift, innate behaviours emerge or disappear. The genome specifies how to learn; learning fills in what is believed. Nature provides the form; nurture provides the content.

I wanted my Bayesian agents to work the same way. But implementing this forced me to confront questions I hadn’t anticipated.

The Hierarchy of Fixedness

Consider all the things one might hold fixed or allow to vary in a simulation like this:

At the bottom, there’s the physics of the simulated world — the grid topology, energy conservation, the basic rules of movement and consumption. These feel genuinely architectural; varying them would mean running a different simulation entirely.

Above that, there’s the representational substrate. Can agents represent probabilities at all? Do they have access to uncertainty quantification, or only point estimates? Is Bayesian inference even available as a cognitive operation, or must they make do with simpler associative mechanisms?

Then there’s the model structure. Given that an agent can represent probabilities, which variables does it include in its world-model? Which dependencies does it posit between them? This is the agent’s theory of causation — its assumptions about what relates to what.

Next come the model parameters: prior means, variances, learning rates. These shape how quickly beliefs change and how much weight is given to new evidence versus old convictions.

Finally, at the top, there are the beliefs themselves — the actual probability distributions over states of the world. These are clearly learned, not inherited; passing them to offspring would be Lamarckian, and we’ve known since Weismann that acquired characteristics don’t work that way.

When I built the original agent, I fixed everything except the top level. The agent learned beliefs, but everything else — sensors, structure, priors — was my design. The question now is: how far down this hierarchy can we push the boundary? What happens if we let more of it evolve?

The Genome as Cognitive Blueprint

I decided to give agents genomes that encode their cognitive architecture. Not their beliefs — those remain learned — but the shape of their belief-forming machinery.

A genome specifies three things:

First, sensors: which attributes of the world this agent can perceive. Reality in my simulation has many attributes — shape, colour, size, texture, temperature, luminosity, sound, smell — but any given agent might only perceive a subset. An agent with sensors for shape and colour but not temperature is, in a meaningful sense, blind to temperature. It will never form beliefs about temperature’s relevance because it cannot observe temperature at all.

Second, Bayesian network structure: which perceived variables the agent models as relevant to predicting outcomes. Having a sensor for texture doesn’t mean the agent uses texture in its predictions. The BN structure encodes the agent’s theory — its assumptions about what causes what. An agent might perceive five attributes but theorise that only two of them matter.

Third, priors: the initial beliefs before any learning occurs. These include the prior mean (is the agent optimistic or pessimistic about unknown foods?), the prior variance (how uncertain does it start?), and the pseudo-observation count (how easily do new observations override the prior?).

This separation creates an interesting dynamic. Two agents might have identical genomes but different learned beliefs, because they encountered different foods during their lifetimes. Conversely, two agents might have similar beliefs but different genomes — one might have achieved accurate beliefs despite a clumsy cognitive architecture, while the other arrived at the same place more elegantly.

The genome doesn’t say “circles are good.” It says “attend to shape” and “start with moderate uncertainty.” The agent then discovers whether circles are good through its own experience. But if the genome says “ignore shape entirely,” no amount of experience will teach the agent about circles, because it cannot perceive them.

Reality as the Hidden Curriculum

If genomes are going to evolve, they need something to evolve against. There must be a fitness landscape — some notion of which cognitive architectures work better than others. And this, I realised, requires taking seriously the idea that reality has a structure that agents are trying to approximate.

So I built a “Reality” module that defines the true causal structure of the simulated world. This is the territory to the agents’ maps. It specifies what actually determines food energy values: which attributes matter, how they interact, whether there are hidden variables agents cannot perceive.

The design philosophy here was to make reality arbitrarily complex and potentially unknowable. Reality might involve variables agents can perceive, but also variables they cannot — like the quadrant of the map where food spawned, or the phase of some hidden cycle, or (and this amused me greatly) the actual hour of the day in my timezone. An agent doing especially well at 3am London time might owe its success to factors it could never possibly model.

This creates a principled gap between what agents can learn and what is true. An agent with perfect inference will converge on the best possible beliefs given its sensors and model structure, but if reality’s causal structure involves factors outside its perceptual reach, those beliefs will be systematically wrong in ways the agent cannot detect from the inside.

This felt important to get right. Too often in AI research, we evaluate agents in environments perfectly matched to their architectures, which obscures the question of whether the architecture itself is appropriate. By allowing reality to be richer than any agent’s model, we create genuine pressure for evolution to discover better architectures.

The Population as Agent

Here is the insight that reframed the whole project for me: the population is itself an agent.

Think about it. An individual agent has beliefs — probability distributions over food values. What does a population have? A distribution over genome types. These are the same kind of thing: probability measures over possible states. The individual agent’s beliefs represent uncertainty about the world; the population’s genome distribution represents “uncertainty” (in a metaphorical sense) about which cognitive architectures work.

An individual agent learns by updating beliefs in response to observations. A population learns by differential reproduction — genomes that produce successful agents become more common, while genomes that produce failures become rare. Bayesian updating and natural selection are both mechanisms for concentrating probability mass on hypotheses that fit the evidence.

An individual agent acts to maximise expected utility given its beliefs. A population… well, this is where the metaphor strains, but there’s something selection-like happening: the population “acts” by producing new agents, and the distribution of those agents shifts toward higher-fitness designs.

If you squint, it’s agents all the way up. An individual agent. A population of agents. A meta-population of different evolutionary strategies. At each level, the same pattern: representations of uncertainty, mechanisms for updating those representations, actions that depend on current beliefs. The levels differ in their timescales and substrates, but the abstract structure rhymes.

This view has a vertiginous quality. Where does it stop? Is there a top level, or do the meta-levels continue indefinitely? And if the pattern holds at every level, what does that tell us about the nature of agency itself?

I don’t have answers, but I find the questions generative.

On the Cost of Cognition

One decision I made early on was to reject artificial costs for thinking.

Many evolutionary simulations impose metabolic penalties on cognitive complexity: a larger brain costs more energy to maintain, so there’s pressure to evolve only as much intelligence as strictly necessary. This is biologically plausible — the human brain consumes roughly 20% of our metabolic budget — and it creates interesting dynamics where simple heuristics can outcompete sophisticated inference if the environment is simple enough.

But I decided against it, for reasons that feel important.

The thing is, imposing artificial costs requires me to decide what cognition is and how to measure its expense. Is a larger sensor suite more costly than a smaller one? Is a denser Bayesian network more expensive than a sparse one? These questions don’t have principled answers in my simulation. Whatever numbers I choose would be arbitrary, and that arbitrariness would shape the evolutionary outcomes in ways that reflect my modelling choices rather than anything fundamental.

Instead, I let time be the only cost. An agent that “thinks longer” simply acts later, and the world moves on while it deliberates. If pondering every decision means missing opportunities, evolution will discover this and favour faster cognition. If hasty action leads to eating poison, evolution will discover that too. The cost of cognition emerges from the dynamics rather than being imposed from outside.

This feels cleaner, though I’m not certain it’s right. There’s an argument that explicit cognitive costs are necessary to prevent evolution from discovering implausibly expensive solutions that would never be viable in resource-constrained reality. But for now, I’m letting the simulation speak.

First Results

When I ran the simulation, I was holding my breath a little. Would the evolutionary process discover anything interesting, or would it churn randomly forever?

The results were modest but real.

In what I called the “simple reality” — where shape and colour independently affect food energy — agents evolved to model shape as the primary relevant variable. This is correct: in that reality, shape has the largest effect. The best agents had genomes that said “perceive shape, model shape as predicting energy” and had learned beliefs that correctly ranked circles above squares above triangles.

More interestingly, in the “interaction reality” — where certain shape-colour combinations have synergistic effects — some agents evolved to model only colour. This initially surprised me. Doesn’t colour capture less information than the full shape-colour combination? But on reflection, it makes sense: green is correlated with the best outcome (the green-circle synergy), and modelling colour alone is simpler than modelling the full interaction. The agent found a heuristic that works well enough without capturing the true structure.

In the “hierarchical reality” — where temperature and texture interact in ways that shape and colour don’t capture — the agents struggled. They couldn’t discover the true causes because those causes involved attributes outside their initial sensor suites. Evolution would need to first expand their perceptual capacities before they could form accurate theories.

This last result points to a limitation of the current framework. Sensor suites can mutate, but the mutations are random. There’s no directed search toward sensors that would be useful; evolution just tries things and sees what survives. In a reality where the relevant attributes are rare in genome-space, it might take a very long time for evolution to stumble upon them.

The God’s-Eye View

There’s one more dimension to this project that I haven’t mentioned: I’m not just an observer. I’m the designer of the fitness function.

When I talk about agents “succeeding” or “failing,” what I mean is that they perform well or poorly according to criteria I’ve specified. Currently, that criterion is net energy gain — the agent’s foraging success. But there’s nothing sacred about this. I could define fitness as “time spent in the northwest quadrant” or “number of red triangles eaten” or anything else I can compute from the agent’s behaviour.

This reframes what the simulation is doing. The agents aren’t evolving to succeed in some absolute sense; they’re being bred to satisfy my objectives. Natural selection, in this context, is a mechanism for aligning agent behaviour with designer intent.

This has implications I’m still thinking through. If I wanted agents that maximise my wealth (as I mentioned in the conversation that prompted this project), I’d need to define a fitness function that rewards wealth-generating behaviour. The agents would then evolve cognitive architectures suited to that objective. But — and this is the crux — they’d have no understanding of why they’re doing what they’re doing. They’d just be executing evolved heuristics that happen to correlate with my utility.

Is this a bug or a feature? On one hand, it’s limiting: agents optimised for proxy objectives might Goodhart’s Law their way into behaviours that satisfy the letter of the fitness function while missing its spirit. On the other hand, it’s honest: we’re not pretending the agents have goals of their own. They’re tools, shaped by selection to serve purposes they cannot comprehend.

I find this more intellectually satisfying than approaches that try to give agents explicit utility functions and then align those functions with human values. The alignment here is implicit, embedded in the selection process itself. Whether it’s more practical for real-world applications is another question.

What I Still Don’t Know

This project has raised more questions than it’s answered, which I take as a sign that I’m prodding at something real.

Can the evolutionary process itself evolve? I’ve fixed the mechanism of variation (mutation) and selection (fitness-based reproduction). But in biological evolution, these mechanisms are themselves products of evolution. Sexual reproduction, mutation rates, developmental plasticity — all of these evolved. Could I set things up so that not only do agents evolve, but the mutation operators and selection pressures evolve too? This would be a kind of meta-evolution, and I don’t yet know how to implement it cleanly.

What’s the minimal fixed point? I’ve been asking what should be designed versus what should evolve, but there’s a deeper question: what’s the minimal kernel from which everything else can emerge? If I fix too much, I’ve prejudged the solution. If I fix too little, I get primordial soup that never organises. Somewhere between is a Goldilocks zone of generative constraints. Finding it feels like the hard problem.

Does this tell us anything about real minds? I’m wary of overclaiming. Simulations like this are toys, and biological cognition is unfathomably more complex. But there’s something seductive about the idea that the structure of cognition — not just its contents — is subject to optimisation pressure. If that’s true for evolved minds, it might inform how we think about designing artificial ones.

What about structure learning within a lifetime? Currently, the Bayesian network structure is fixed by the genome. An agent is born with a theory of what matters and never revises that theory, only the parameters within it. But sophisticated learners — humans, say — can revise their ontologies. We don’t just update beliefs within a framework; we sometimes discard the framework and adopt a new one. Allowing within-lifetime structure learning would be more powerful but much more complex to implement. It’s on the list.

The Code

The implementation is a few hundred lines of Python, no ML frameworks required. There are four modules:

reality.py defines the true causal structure of the world. It specifies which attributes exist, how they combine to determine food energy, and whether there are hidden variables or external dependencies (like real-world time). Several preset realities are included for experimentation.

cognitive_genome.py defines the heritable specification of cognitive architecture. It encodes sensors, Bayesian network structure, and priors, with mutation and crossover operators for evolution.

bayesian_agent.py implements agents that learn within their lifetimes. Each agent uses its genome to determine what it perceives and how it models the world, then does exact Bayesian inference to update beliefs based on experience.

simulation.py orchestrates the evolutionary process. It runs generations, applies selection, produces offspring, and tracks statistics over evolutionary time.

Running it looks like this:

Code
from simulation import run_experiment

sim = run_experiment(
    reality_type='simple',      # or 'interaction', 'hierarchical', 'temporal'
    num_generations=50,
    population_size=30,
    verbose=True
)

# What did evolution discover?
best = sim.best_ever
print(f"Sensors: {best['sensors']}")
print(f"BN structure: {best['bn_structure']}")
print(f"Learned beliefs: {best['beliefs']}")

Typical output shows evolution converging on sensible architectures — agents that model the variables that actually matter and learn correct (or at least useful) beliefs about them.

The code is on GitHub, alongside the original Part 1 implementation. Fork it, break it, extend it. The framework is modular enough to support experimentation with different realities, different genome structures, different selection pressures. I’m curious what others will find.

Closing Thoughts

I started this project wanting to understand what happens when cognitive architecture itself can evolve. I ended up thinking about the fractal nature of agency, the gap between map and territory, the relationship between optimisation and alignment, and the uncomfortable question of what should be designed versus discovered.

The code works — agents evolve, beliefs are learned, fitness improves over generations. But the code was always a vehicle for thinking, not an end in itself. The real product is a slightly better understanding of these old questions, approached from a new angle.

Friston talks about the free-energy principle, the idea that all adaptive systems minimise surprise through a combination of updating beliefs and acting on the world. Hinton and Nowlan wrote about how learning can guide evolution, allowing plastic individuals to survive in novel environments long enough for selection to consolidate their gains. Baldwin proposed this over a century ago, and biologists are still working out the implications.

I don’t think my toy simulation adds much to that serious scientific work. But I find that building things clarifies my thinking in ways that reading alone does not. The act of specifying data structures for genomes and beliefs and realities forces a precision that philosophical discourse permits you to avoid. And when the simulation runs and produces results you didn’t expect, it’s a reminder that even simple systems can surprise their creators.

The agents in my simulation are not conscious, do not understand what they’re doing, and serve purposes they cannot comprehend. In that sense, they’re nothing like us. But in another sense — in the sense that they adapt on multiple timescales, that their cognitive structures are shaped by forces outside their awareness, that they navigate a gap between what they can model and what is true — perhaps they’re more like us than we’d prefer to admit.

I’ll keep tinkering.