§ 02.15 · patterns · context · memory-tiers

Context Assembly

The discipline that replaces data modelling when you're building AI-first. For years I kept asking what should we store — the right question is what should the AI see.

// kr8 reads this one with you — bring a project.

The question I was asking wrong

In traditional software, design starts with data modelling. What entities do we have, how do they relate, what goes in which table. Normal forms. Foreign keys. The database is the source of truth, and getting the schema right is most of the architectural work.

I kept trying to apply the same reflex to AI platforms. It didn't work, and for a while I couldn't see why. The schemas I built were fine. The retrieval was correct. The LLM calls returned outputs. The outputs were mediocre, and I assumed the problem was prompts.

The unlock was eventually noticing that for an AI-first platform the primary question isn't "what should we store" — it's "what should the AI see on this call." The database still exists. It's just not the source of truth for decisions any more. The context window is. And assembling a good context window is a different discipline from modelling data.

Three tiers, not one pile

The shape that works is context in layers, each optimised for a different tradeoff.

L1 hot — Redis or in-memory, under 10ms, small, high-frequency. The active session state. What is the user doing right now? Which document are they on? What did they just approve? Last five messages. The attention registry — what has been viewed or touched in the last thirty minutes.

L2 warm — vector store plus summaries, under 100ms, semantic-indexed, medium size. Conversation summaries from prior sessions. Embeddings for similarity search ("which past facts look like this one"). Authority scores — which sources are trusted, which outputs have been human-approved. This is where the bulk of the contextual intelligence lives.

L3 cold — full event store, under 500ms, permanent, complete. Every message ever, every event, every document chunk, every decision. Slow, huge, comprehensive. Queried when L1 and L2 haven't been enough.

The mistake I used to make was dumping everything into the prompt and relying on the model to sort it out. Modern context windows are big enough to enable that, which is exactly why it's a trap — big context with low signal-to-noise degrades performance worse than a tight context with good signal. The Stanford measurement on tiering was explicit: a 26% quality drop without tiering. The tiers exist because signal-to-noise matters more than volume.

The assembly pipeline

Every AI call I care about now runs through the same six-step pipeline. Intent classification — what is the user asking, what kind of context does this need? Tier selection — L1? L2? L3? All three? Retrieval from the selected tiers. Ranking — sort by a score that blends relevance (vector similarity to the query), recency (exponential decay; something from this week is worth a lot more than something from a year ago), and authority (human-approved outranks AI-suggested). Compression — fit into the token budget; summarise lower-ranked items; discard below threshold. Assembly — construct the final context as narrative, not as a flat concatenation.

The word narrative is load-bearing. A normalised database layout is correct but terrible as context — fragmented across tables, stripped of relationships, devoid of story. The same information written as a paragraph, with the relationships inline and the recent changes highlighted, produces dramatically better decisions. I've benchmarked this more than once and the gap is large. Denormalised narrative beats normalised structure for AI decisions every time.

Context ≠ data

This is the bit that took me the longest to accept. Context engineering and data modelling will routinely pull in opposite directions. Normal forms say "store each fact once and join." Context says "bring the related facts together as prose and pay the redundancy cost." ACID transactions protect against inconsistency. Context thinks inconsistency is sometimes the most interesting signal to surface.

I now treat the two disciplines as separate concerns with separate storage strategies. The relational store is still there and still normalised. The context layer sits on top, assembling denormalised, narrative views for specific decisions, refreshed on demand. Trying to make a single representation serve both purposes is how I ended up with databases that were good at neither.

The scoring principle

Every candidate item gets a score that multiplies three dimensions: relevance (how close to the query), recency (how fresh), authority (how trusted). Multiplication, not addition — a perfectly-relevant fact from three years ago still scores low because recency drags it down, and that's correct. Low recency on an old fact that was never approved should score near zero. A fresh fact from an unapproved source should be lower than a slightly older fact that was lawyer-approved.

Rank by the composite score, pack into the context window until full, compress or discard the rest. Token budget is real. A 200K window is generous; packing it carelessly is still the fastest way to degrade output quality.

Tool context is context too

I had an underline moment on this one when the Anthropic MCP team published their progressive discovery model for tools. They went from loading 17,000 tool descriptions into context to three tiers: always-loaded core, deferred-and-searchable middle, and discoverable-via-registry outer. Structurally identical to the L1/L2/L3 pattern for domain context. Both solve the same problem — fitting the most relevant information into a fixed token budget — and both converge on graduated retrieval.

Which I take as a signal that the three-tier pattern isn't a quirk of legal-tech or logistics platforms. It's a universal shape for any system where an agent selects from a large knowledge or capability surface. If you find yourself dumping a big pile of anything into a prompt, tier it.

What I check now, every time

When I design an AI decision point, the questions I now hold up to it: what is the decision, what is the L1 context that's active right now, what is the L2 context that's relevant, what is the L3 context that's complete, what's my token budget, and what's my scoring shape. Those six questions are what "context engineering" concretely means for me. Get them right and the prompts largely take care of themselves. Get them wrong and no amount of prompt refinement closes the gap.

kr8 ·

what does your AI actually see right now — and what would change if you assembled that context deliberately instead of dumping it?

kr8 · next

// Keep reading the playbook?

In Production →Journal →