The Demon Principle
A thermodynamic analogue for why the human in the loop is load-bearing, not friction. Arrived at the long way — by watching three platforms get strange when I tried to remove the human.
Where this came from
I didn't arrive at this by reading Maxwell. I arrived at it by watching platforms get weirdly unreliable the moment I tried to take the human out of the loop. At the time I read it as a UX problem — users didn't trust the system yet, we'd earn it, eventually they'd let the AI run. A year in, I started to suspect I had the direction backwards. The human wasn't the friction in the path. The human was the thing keeping the path honest.
Every AI output I was shipping was a reduction of disorder — messy into tidy, raw into structured. Useful. But the disorder didn't vanish; it went somewhere. And the somewhere was the human's decision when they reviewed the output. Accept, modify, reject. I had been thinking of that decision as overhead. It turned out to be load-bearing.
The trilemma
Once I started looking, the shape was obvious in retrospect. Any AI system that changes over time has to pick two out of three: continuous self-evolution, complete isolation, and safety invariance. It would like to get better on its own, operate without external oversight, and have its outputs remain correct as the domain drifts underneath it. It can't have all three.
Pick self-evolution and isolation, and the system optimises for its own metrics. Those metrics drift slowly from the true objective. Confidently-wrong outputs pile up with nothing to correct them, and the worst part is that the drift is invisible until it's structural. Pick isolation and safety, and the system is static — safe the day it ships, obsolete a quarter later. The only viable pair is self-evolution plus safety, and it requires the thing the other two let you skip: an external correction source.
That correction source can only be the human. Not as a safety net bolted on. As the thing that makes the architecture valid in the first place.
The Maxwell analogue
I'm wary of physics analogies in software — they usually paper over mess rather than clarify it — but this one held up for me. Maxwell sketched the sorting demon in 1867. A century later Landauer and Bennett closed the loop: the demon cannot work for free. Reducing disorder has an irreducible energy cost, and it has to be paid somewhere.
An AI output reduces disorder. I had been quietly assuming the cost was paid by the compute that generated the output. It isn't. The compute is the sorting. The energy that keeps the sorted result honest is the human's decision to accept, modify, or reject it. Skip that energy input and the platform starts drifting — but it drifts slowly enough that you don't notice until a year has gone by and something breaks that you can no longer reconstruct.
Three outputs, one decision
Here's the part I wish I'd seen earlier. When a human reviews an AI proposal, that's one event. But it's producing three durable artefacts at the same time — and I was building three separate systems for them.
The accepted output becomes part of the record. The delta between what the AI proposed and what the human chose is a training signal. The exchange itself is an audit trail. One event. Three projections.
On one of the earlier platforms I had the record as a database, the audit trail as logs, and the training signal as — nothing, really, because we hadn't built an optimisation loop yet. Three half-systems, none of them aware they were projections of the same event. A year in, the training data we finally tried to extract didn't match the record, because they'd been derived by different code paths with different bugs. I don't think this was bad luck. I think it's what happens whenever a team treats each artefact as a separate feature instead of a projection.
Shortcuts I've tried and regretted
A few design rules fall out of the trilemma, and the honest version of each is the one phrased as something I've personally got wrong.
AI generates, human decides. I did this backwards once, on a field I'd convinced myself was low-stakes. The AI's output went straight into the record "to reduce clicks." Six weeks later it produced the hardest bug I've ever had to reconstruct — not because the logic was complicated, but because I had no record of what the AI had proposed versus what ended up stored. I now treat "let's auto-apply this one, it's harmless" as a smell.
Feedback should be transparent. The human should always be able to see what the AI proposed and what they changed. The moment the delta is hidden, the training signal is gone — and you won't notice it's gone until you need it.
Memory has to be editable. I shipped a platform where the AI's memory was a derived view you couldn't correct directly. You could only correct the underlying events, and the view would eventually catch up. It sounded clean. In practice it turned every correction into a half-hour exercise and nobody did them.
Every decision is a training signal. Accept/modify/reject isn't just UX. If you're not capturing it as structured data, you're shipping a system that cannot get better — even though every interaction is leaving the signal on the floor.
Where the value actually sits
The competitive implication took a while to land for me. The scarce resource isn't the model. Any competitor can call the same API. The scarce resource is the feedback corpus — thousands of domain-expert decisions captured as structured traces and run through optimisation. That corpus cannot be generated synthetically; it requires real judgment on real tasks.
Which means the Demon Principle isn't just thermodynamic scaffolding I wheel out to sound interesting. It's the mechanism that makes the corpus possible. I'd been treating the human as overhead on the path to an autonomous system. It turns out the human is where the value actually accumulates.
Want to think through how this lands in your project? Tell kr8 what you’re working with.
// Keep reading the playbook?