§ 02.03 · architecture · skills · harness

Skills over Controllers

I wrote controllers for years. The pattern that replaced them was less obvious than I'd like it to have been — and this is the version of that story I wish someone had told me earlier.

// kr8 reads this one with you — bring a project.

The instinct I had

The first time I gave an AI platform a new capability, I did what any engineer would do: I wrote a controller. A handler function, assembling context, calling the model with a carefully-engineered prompt, parsing the output, returning a result. Add a new capability, add a new controller. Change a prompt, ship a new deploy. It felt like software.

This worked for the first few capabilities. Then it stopped scaling, and the way it stopped scaling was subtle enough that I spent a while blaming the wrong things.

Two things that went quietly wrong

The first thing I noticed is that the capabilities had become illegible to the platform itself. A controller is a piece of code; only humans can read it. The agent couldn't inspect, edit, or improve what it couldn't see. We'd built an AI platform whose own intelligence was locked out of the thing that defined its own behaviour. I spent a quarter trying to retrofit a meta-agent against this architecture and it was miserable — you can't ask a language model to diff a controller function the way you can ask it to diff a paragraph of prose.

The second thing took me longer to see. The friction to change behaviour was engineering friction — deploys, code review, release coordination — rather than authoring friction. Every tweak to a prompt became a commit. Every new capability became a migration. The harness was rigid exactly where it should have been most malleable. The part of the system that was supposed to learn fastest was the part I'd made hardest to change.

Once I saw these two together, the controller-per-capability pattern looked less like architecture and more like a habit I'd carried over from web apps, where it belonged.

What I use instead

Every agent capability is a skill: a markdown file containing a natural-language recipe — the soul of the capability, the required context, the output format, the fallback behaviour, the quality gate. Alongside each skill sits a small YAML execution contract: the bindings the harness enforces at runtime (max_tokens, timeout, required_context, output schema, fallback string, budget).

The code that reads skills is a small, stable harness primitive. It loads the skill, assembles the declared context, calls the model, validates the output against the contract, and emits the result as an event. Adding a new capability is writing a new markdown file. That is the point, and it took me longer than I'd like to trust it.

The rule that took me a while to hold firmly is this: if a "skill" needs code to work, it isn't a skill, it's a harness primitive. Crossing that line by accident — adding a small helper function here, a bespoke parser there — turns the markdown layer back into controllers by another name, which I've done at least twice and regretted both times. I now treat "this skill needs a tiny bit of code" as a signal to step back and ask whether the skill is actually a skill or whether I'm growing a new primitive.

What this unlocks

Three compounding effects, each of which I found hard to get any other way.

The meta-agent can read, edit, and test skills. Because skills are files, a meta-agent can treat them the way a developer treats code — propose a diff, run it against a held-out eval set, commit the change if quality improves, otherwise throw it away. The platform edits its own behaviour, gated by evals rather than by a tired human reviewer at the end of the week. I don't think I fully believed this would work until I watched it work. Once you've seen a skill improve overnight because the meta-agent caught a pattern I hadn't noticed, it's hard to go back.

Skills become versionable artefacts. You can diff them, review them, roll them back, branch them. A skill change is a commit, and every commit carries provenance. This matters more than it sounds — the first time a skill regresses and you can git blame it to a specific change on a specific day, you stop having the sinking feeling that your AI behaviour is drifting in ways nobody can account for.

Finally, the harness becomes model-agnostic in a way the controller pattern never quite was. A well-written skill is a prompt and a contract; it doesn't much care whether the model underneath is Claude 4.7 or something three years from now. The harness that reads and enforces the skill is the product. The model is a commodity you swap underneath it without the skill library changing shape. This is why I now treat model-swap resilience as a correctness property rather than a nice-to-have — if you can't swap the model without rewriting your capabilities, your capabilities are in the wrong place.

What pre-dates this

Not all of my platforms work this way. On EzyLegal, the skill layer arrived before I had the vocabulary for it — there are capabilities that are partly skills and partly controllers, and the seam between them is exactly where the maintenance burden sits. On RideSync, much of the capability logic still lives in handler code, because I hadn't worked out the pattern when that code was written. Those are platforms I'd build differently if I were starting today, and part of writing this down is admitting that out loud.

kr8 ·

what's a controller in your codebase that should be a skill — or a skill that's still pretending to be a controller?

kr8 · next

// Keep reading the playbook?

In Production →Journal →