Triple Output on Every Feedback Interaction
I had all three outputs — updated state, training signal, audit trail — but I'd built them as three separate systems. Which, it turns out, is almost as bad as not having them at all.
The wrong question I was asking
For a while I thought the question was "are we capturing feedback?" and the answer was yes, so I moved on. The state was updating. The audit logs were flowing. We even had some training data we could pull out if pressed. Three separate systems. Three separate pipelines. Three separate levels of polish. Two of them half-forgotten.
Then a specific thing happened that made me rewrite the mental model. A user flagged that the system had "remembered something wrong." I went to reconstruct what the AI had proposed, what they'd changed it to, and when — and I couldn't. The state said the current value. The audit log said the timestamp. The training data had a sample of something that looked approximately like the interaction but had been batch-processed and lost resolution. Three systems, none of them aware they were supposed to be projections of the same event.
I now think this is the quiet failure mode most AI platforms share. Not the absence of feedback capture, but the presence of three weakly-coupled captures that pretend to be one.
What a feedback interaction actually is
A feedback interaction is not a "like" button. It's not a rating widget. It's not an after-the-fact survey. It's the moment when a human decides what to do with an AI proposal — accept as-is, modify before it lands, reject outright. That moment is where the thermodynamic ledger balances. The AI has reduced entropy by producing a proposal. The human has now absorbed the cost by deciding. What the platform does with that decision determines whether the loop closes or leaks.
The framing I've ended up with, after getting this wrong, is that the decision is a single event which produces three durable artefacts simultaneously. Not three separate captures. One event, three projections. If the platform is architected any other way, the projections will drift from each other, and at the exact moment you need them to align — debugging a strange outcome, answering a regulator, fine-tuning on the training corpus — they won't.
The three artefacts, honestly
Updated state. The user's journey advances. The record changes to reflect whatever they accepted or modified. This is the only output a CRUD-minded system would bother to write, and it's the only one that's load-bearing for the user's immediate experience. It's also the most dangerous, because if it's the only one, the platform feels responsive but never learns.
A training signal. The delta between what the AI proposed and what the human chose, captured verbatim — not summarised, not interpreted. Raw before-and-after. Modify actions carry the most information here, which is why I now instrument them first on any new feature. The system knows exactly how the proposal failed to meet the user's intent. A reject, by contrast, tells you less than it seems to; you know the proposal was wrong but not in which direction.
An audit trail. An immutable event appended to the stream, recording the whole interaction: context provided, prompt used, proposal returned, action taken, delta recorded, timestamp. This is the receipt. It's also the raw material for later meta-agent improvement cycles, for regulatory explanation, and for time-travel debugging when something weird surfaces weeks after the fact.
If a UI element produces fewer than three, I've learned to treat it as decoration rather than feedback. Not because it's useless — sometimes a rating widget genuinely is just a rating widget — but because calling it feedback creates the illusion that the loop is closed when it isn't. Measuring decoration is how teams convince themselves they're learning when they're actually just recording user warmth.
Why three, not one
The three outputs compound each other in a way that doesn't work if any of them is missing.
The audit trail gives you the raw data. The training signal gives you the supervised pairs that make future evals meaningful. The state update closes the loop for the user so they see their decision reflected immediately — which is what makes them willing to make the next decision, which is what produces the next training signal. Skip any one and something goes wrong in a way that's hard to see until it's structural.
State without the delta gives you a product that feels responsive but never learns. I had exactly this for a long time and the quarterly retros kept turning into "why isn't the model getting better?" It wasn't the model. It was that I'd only been writing the state.
A training signal without the audit trail gives you data you can't defend or reproduce. You can fine-tune on it and you can't explain what you fine-tuned on. This is a regulatory problem first and an epistemic problem second — and the epistemic one is actually worse, because it means even you can't trust your own training data.
An audit trail without the state update gives you an observability-first platform that users quietly abandon, because pressing the button produces no visible change. I haven't shipped this version, but I've reviewed systems that are effectively this, and the engagement graph tells the story eventually.
Where I enforce it now
Triple output is a constitutional invariant in the platforms I build new today — enforced in code, not in prose. The feedback code path literally cannot commit one of the three without the other two. I have a single function that writes the event and fans out the three projections from it, and there's a test that fails if anyone tries to add a write path that skirts it. That rigidity is what keeps the ledger balanced over the life of the product, and it's rigidity I wish I'd built into the earlier platforms before the scar tissue arrived.
On the pre-thesis platforms this invariant is partial at best. Some feedback code paths produce all three; some don't. Untangling them is a retrofit I've done in patches but not comprehensively, and I'd estimate two of my earlier platforms still have feedback interactions that produce only one or two artefacts, silently. I know which ones. I haven't fixed them yet. That's the honest answer.
Want to think through how this lands in your project? Tell kr8 what you’re working with.
// Keep reading the playbook?