Memory Architecture
Six memory types, not one. I kept writing platforms as if "the database" was the whole memory story, and being surprised when the platform felt amnesic. This is the taxonomy I should have started with.
An agent without memory is a function
The line that helped me internalise this: an agent without memory is a function, an agent with memory is a colleague, a platform with shared memory is an institution. My earlier platforms were function-shaped. Every session started cold. Every interaction forgot the last one. Users had to re-explain context the system had seen ten minutes ago. I called this "stateless architecture" and thought of it as a feature.
It was a feature for the wrong thing. The statelessness was at the runtime layer — correct, good. But I'd let that stateless property leak upward into the orchestration layer, where it cost me everything. The platforms I've built since have been deliberate about what kinds of memory exist and where each lives. Six kinds, as it turns out, not one.
Working memory — the context window
The tokens currently loaded into the LLM for a single call. Finite (4K to 200K depending on model), volatile (gone after the call completes), expensive (every token costs compute). This is the only memory the model can directly reason over. Everything else in the taxonomy exists to feed it.
The discipline that earns its keep here is attention, not storage. The question isn't "what can I cram in?" — it's "what should the model attend to for this decision?" Context Assembly is the mechanism; working memory is the surface.
Short-term memory — the session
State that persists across multiple LLM calls within a single session. The scratchpad. Conversation history, current focus, pending decisions, agent coordination state, token budget remaining. Unbounded in principle (it's storage, not tokens), semi-volatile (lives for the session, then archived or discarded), shared across agents within a session.
Where this lives in practice: Redis for the hot tier, Postgres for the warm serialisation on session close. The rule I've learned to hold: if an agent crashes mid-session, can you resume from the last checkpoint? If not, the short-term memory is too volatile and you're going to lose work the first time something weird happens.
Episodic memory — the diary
The chronological record of events, decisions, and interactions. Immutable, timestamped, attributed. This is the event store from the Event Sourcing module, viewed through a memory lens. Every FactExtracted, FactApproved, TimelineGenerated, SessionStarted event is an episodic memory entry.
Why this matters for memory specifically: when a meta-agent reads traces to improve a skill, it's querying episodic memory. When the "what changed since your last visit" view shows up, that's a projection over episodic memory. When a regulator wants proof of who approved what, they're reading episodic memory.
The rule that seems boring until it bites: episodic memory stays raw, never summarised. Summaries lose the signal (the 50% → 34.9% figure from optimisation-loops). You can project summaries from the raw events. You can't un-summarise back to raw.
Semantic memory — what things mean
Structured knowledge about entities, relationships, and domain concepts. This is where I kept trying to use one store and kept running into the same wall. Semantic memory fundamentally needs dual representation — relational AND vector, with a reconciliation layer.
Relational (Postgres) handles exact lookup, filtered aggregation, relationship traversal, temporal queries. Vector (Qdrant or pgvector) handles similarity search, fuzzy matching, clustering. Neither alone is sufficient. The platforms I tried to build with only one of these had the exact failure mode of that store's weakness — either I couldn't do similarity ("find me something like X") or I couldn't enforce constraints ("only facts from this tenant, this year, status approved").
Production semantic memory writes to both stores on every update. Retrieval is hybrid: vector search for candidates, relational filter for constraints, rerank against the query. Without this shape, queries get slow, results get noisy, and isolation boundaries leak.
Procedural memory — muscle
Learned patterns of behaviour. Skills, optimised prompts, configuration preferences, communication conventions. Soul markdown files, agent configs, user preference files. This is already covered in Skills over Controllers and Skills Architecture — procedural memory is what makes those patterns memory-flavoured rather than just configuration.
A skill file that's been through fifty iterations of the Karpathy Loop is procedural memory — the accumulated knowledge of how to do that task. A user preference file learned from observation is procedural memory. The harness itself, in the broadest sense, is procedural memory written as text.
The rule: procedural memory has to be version-controlled and diffable. You need to see what changed, when, and why. Git is the right shape. Skill files live there.
Institutional memory — what the organisation knows
Cross-entity patterns. "Seventy percent of contested cases with this fact pattern settle at sixty to eighty percent of claimed value." "Routes avoiding Highway X between 3 and 6 pm save twenty-two minutes on average." Emergent knowledge that no single case, user, or session could produce alone.
This is the part I mostly haven't built well. Institutional memory is hard — it requires aggregation across entities with privacy boundaries enforced, pattern detection that resists spurious correlation, human validation of the patterns before they go live. None of my earlier platforms got this right. The ones that didn't even try had no institutional memory. The ones that tried naively leaked data across tenants.
When it works, it's where the real moat lives — patterns nobody else can generate because nobody else has the aggregated feedback corpus. When it doesn't, it's either absent or a liability.
The lifecycle I now draw
Every piece of information in the platform moves through six stages: encoding (extract signal from noise at the boundary), storage (into the right tier for its memory type), retrieval (into working memory for a specific decision), consolidation (compress into higher-level abstractions without losing the raw), forgetting (archive or prune what's expired), and transfer (apply patterns from one context to another).
The one I failed at most often, early on, was encoding. Quality at encoding determines quality forever. A fact poorly extracted is a fact permanently degraded — no downstream tier can rescue it. Encoding is the most consequential stage, and I kept treating it as a thing to add later.
The boundaries that cannot slip
Memory boundaries matter more than almost anything in multi-tenant platforms. Platform memory is shared. Tenant memory is isolated. Case memory sits inside tenant. Session memory sits inside case. Agent-level memory is ephemeral inside session. Upward transfer across these boundaries only happens through aggregation pipelines with privacy filters. Lateral transfer across tenants never happens.
The filter-by-tenant-id query pattern is not isolation — it's a filter, and filters have bugs. Physical isolation — separate collections in the vector store, row-level security at the DB, session keys that start with the tenant_id — is what actually prevents leaks. I've had the filter-style bug surface in production once. Once was enough.
The one-line version
Working memory attends. Short-term memory coordinates. Episodic memory records. Semantic memory understands. Procedural memory acts. Institutional memory compounds. Together they make the platform remember. Missing any one and the platform feels hollow in a specific, predictable way.
Want to think through how this lands in your project? Tell kr8 what you’re working with.
// Keep reading the playbook?