← Blog

Your Agent's Memory Is an Ungoverned Write

The threat models many teams use for autonomous AI systems quietly assume that the attack surface resets at the end of a session, and the assumption is wrong. A prompt injection has to succeed in real time, against the defences active in the conversation it occurred in. When the session ends, the conversation ends, and much of the attack surface is supposed to go with it. That assumption breaks the moment memory enters the picture.

When an agent persists state across sessions, conversation history, learned preferences, accumulated context, an evolving model of the user the agent is supposed to consult on the next interaction, that persistent state becomes a channel through which a compromise in one session can shape actions in a future one. The attack plants a payload, the session ends, and some time later, possibly weeks, a different session retrieves the payload as if it were ordinary remembered context, and the agent acts on it. The forensic link between cause and effect is severed by the gap in time and by the fact that the activating session looks, to per-action governance, completely normal.

This is the shape of attack I want to make explicit, because the architectural response to it differs from the response to in-session attacks.

Memory Breaks Session Boundaries

Prompt injection and confused-deputy attacks are often constrained to the session in which they occur. They have to succeed in real time, win against whatever defences are running in that conversation, and operate before the agent’s context is reset.

Memory breaks every one of those constraints. Memory is, in a real sense, the agent’s persistent representation of itself and the people it has interacted with, and it is exactly the kind of artefact that an attacker can plant something in, walk away from, and let a later session weaponise. Cause and effect end up separated in time, and unless governance explicitly connects them through the evidence chain, no individual decision will look wrong.

Contamination Moves Across Time

I want to name the lifecycle in three phases, because naming them is the cleanest way to see where governance has to reach.

Injection is the first phase. Adversarial content enters the agent’s memory through a write operation. The content might be a direct instruction embedded in conversational context that the agent decides to remember, a set of examples establishing a behavioural pattern the attacker wants the agent to reproduce later, or metadata that influences how the memory is retrieved and prioritised in future contexts. If memory writes are governed actions, meaning that writing to persistent memory crosses an enforcement boundary, then the write produces a receipt. The content may not be flagged as adversarial, because the authority decision cares about authorisation rather than semantics, but the receipt exists and provides the forensic anchor that any later investigation will need. If memory writes are not governed actions, the injection is invisible to governance: no receipt, no record of what was written or when, and the first evidence of contamination appears only when the contaminated memory influences a future governed action.

Persistence is the second phase. The adversarial content sits in the memory store, waiting. It may be retrieved in every session that matches the retrieval criteria, or it may sit dormant until a specific query activates it. Nothing happens during this phase. The memory store looks completely normal, and at the storage layer the adversarial content is just another remembered preference, label or summary.

Activation is the third phase. A future session retrieves the contaminated memory. The adversarial content enters the model’s context and influences behaviour, and the model produces an action that crosses an enforcement boundary. Governance evaluates the action. If the action is outside the delegation’s scope, it is denied. If it is within scope, it is permitted. The fact that the action was motivated by contaminated memory is invisible to per-action evaluation, because that evaluation only knows the canonical action, the policy and the delegation in force at the moment.

The result is what I have started calling temporal privilege escalation: a write authorised under one delegation activates under another, in a different session, separated by an arbitrary span of time.

The Gap Is The Memory Write

No individual governance decision was wrong in that lifecycle. Every action in the chain was correctly authorised against the delegation in effect at the time. The failure was earlier and structural: a prior state mutation, the memory write, was not treated as a governed action with constraints on what may be persisted and later reused under different authority. The governance boundary had been drawn around execution but not around the persistence that shapes future execution.

Memory is not optional in useful agent systems. An agent that cannot remember previous interactions has to be fully re-instructed for every session, and the operational cost of statelessness is high enough that durable memory quickly becomes attractive. The challenge, then, is not to eliminate memory. It is to bring memory under enforcement, treating memory writes and memory reads as governed actions with the same evidence requirements as any other crossing of an enforcement boundary.

In practice, that means a memory write produces a receipt, a memory read produces a receipt, and the evidence chain connects what was written, when, by whom and under what delegation, to what was later read, when, by whom, and what subsequent action resulted. Without that linkage the contamination lifecycle stays invisible. The injection receipt, if one exists, and the activation receipt may be separated by an arbitrary number of intervening decisions and an arbitrary span of time. Connecting the two requires cross-session evidence-chain analysis. Per-action authority does not provide that analysis on its own, but a properly shaped evidence chain gives the analysis something to inspect.

Memory Has To Cross The Boundary

Memory is the persistence layer of autonomous AI systems. If the persistence layer is ungoverned, attacks persist along with it. If governance treats memory writes as incidental bookkeeping rather than as consequential operations, memory becomes the channel through which session-constrained attacks become durable.

An agent’s memory is not a convenience feature. It is a deferred action boundary. Today’s write becomes tomorrow’s execution context, and if that write is not governed, the compromise gets a place to live where the next authority decision may not know to look.