Lifecycle State Flow
Agent system authors — the 8 embedding points aren't 8 isolated hooks; they share a state that flows across steps, calls, and sessions. This chapter traces one full conversation's state evolution end-to-end.
Why a second cut
Applying to Your Agent cuts by hook — “what you do in prepareStep / in tool.execute”. That’s the static map.
But real production code isn’t 8 isolated hooks — they share state:
onToolCallFinishwrites to compactCache; the nextprepareStepreads from itonStepFinishwrites lastStepTotalTokens; the nextprepareStepreads to decide whether to compacttool.executewrites reminders; a laterprepareStepinjects them into the promptonFinishpersists messages; the next call-site loads them
Without making these producer-consumer pairs explicit, readers step on the seams when implementing.
This chapter is the dynamic walkthrough — tracing how state evolves across the 8 points in one conversation. Complementary to chapter 9’s static map.
Disclaimer: code examples below are a teaching synthesis of CC design + AI SDK best practices — field names differ in detail from Zapvol’s actual implementation. For example, the
context.compactCache: new Map()shown below is in Zapvol a file-based cache (.compact.json), andsession.compactBoundaryis in fact part ofsession.compactionCheckpoint. This chapter emphasizes state-flow patterns over literal field matching; for concrete Zapvol implementation seepackages/backend/src/agent/source.
Four state buckets: classified by lifetime
All agent state fits into 4 buckets by “how long it lives”:
| Bucket | Lifetime | Typical fields | Storage |
|---|---|---|---|
| Per-step | This step’s start → end | stepNumber · stepMessages · current tool_use block · current LLM response stream | AI SDK internal |
| Per-call | streamText start → onFinish returns | toolUseContext · abortController · compactCache · lastStepTotalTokens · reminders · todos | Your context object |
| Cross-call (in session) | Across multiple streamText invocations in one session | session.messages · compactBoundary · compactionCheckpoint · cumulative usage | Session DB / memory |
| Cross-session | Persists across sessions | CLAUDE.md / AGENT.md · user prefs · auto memory · policy rules | Filesystem / DB |
The per-call / cross-call boundary is the easiest to mess up — both “span multiple steps” but have different lifetimes:
- Per-call is shared across steps within one streamText call; after streamText returns, this state should be GC’d
- Cross-call is persisted after streamText completes; the next call-site reads it back
Confusing them = either state doesn’t persist (next read fails) or memory leaks (per-call state never releases).
Full conversation walkthrough
Let’s trace a real scenario — “user says ‘help me fix the auth bug’, agent runs 3 steps to complete” — following which state each embedding point reads/writes.
Preamble: session restoration
Before the user hits Enter, the session layer already contains:
session.messages = [/* prior rounds' messages */]
session.compactionCheckpoint = { lastStepTotalTokens: 45_000, compactedRounds: {...} }
session.usage = { cumulativeInput: 120_000, cumulativeOutput: 8_000, ... }
Cross-session layer is stable:
CLAUDE.md (filesystem): project rules
~/.claude/CLAUDE.md: user prefs
autoMemory: [feedback_terse, project_migration_freeze, ...]
Point A · Call-site assembly
// New user message + session.messages concat
const allMessages = [...session.messages, { role: 'user', content: userInput }]
// Read cross-session
const claudeMd = await loadLayeredClaudeMd() // Managed → User → Project → Local
const autoMem = await loadAutoMemoryIndex() // MEMORY.md index (≤200 lines)
const systemPrompt = assembleSystemPrompt({ claudeMd, autoMem, ...staticParts })
// Write per-call (initialize context)
const context: ZapvolContext = {
taskId,
abortController: new AbortController(), // lives through entire streamText
lastStepTotalTokens: session.compactionCheckpoint.lastStepTotalTokens, // ← seed
compactCache: new Map(), // empty, waits for E to write
reminders: [],
todos: [...],
// ...
}
// Launch AI SDK
const result = streamText({
system: systemPrompt, // byte-stable
messages: allMessages,
tools: buildTools(),
prepareStep: /* see Point B */,
experimental_onToolCallFinish: createToolPrecompactHook({ context, ... }),
onStepFinish: step => {
// Update per-call state
context.lastStepTotalTokens = deriveFromStep(step) // ← producer
},
onFinish: /* see Point G */,
})
A’s state R/W:
- Read:
session.messages,session.compactionCheckpoint,claudeMd,autoMem(all cross-session / cross-call) - Write:
context.lastStepTotalTokens(seeds initial value),context.abortController(new instance)
Key: context.lastStepTotalTokens’s initial value comes from session.compactionCheckpoint. This is the
cross-call → per-call state handoff — session restore inherits the prior round’s token estimate.
Point B · prepareStep (step 0, first step)
prepareStep: async ({ messages: stepMessages, steps, model }) => {
const isFirstStep = steps.length === 0 // true
// Read per-call state
const currentTokens = context.lastStepTotalTokens // ← still A's initial value
const reminders = context.reminders
// First step: boundary filter + autocompact
const afterBoundary = getMessagesAfterCompactBoundary(
stepMessages,
session.compactBoundary,
)
let compacted = afterBoundary
if (currentTokens > AUTOCOMPACT_THRESHOLD) {
compacted = await autocompact(afterBoundary, {
signal: context.abortController.signal, // ← pass abort
})
}
// Mark cache breakpoints
const prepared = applyCacheControl(compacted, model)
return { messages: prepared }
}
B’s R/W at step 0:
- Read:
stepMessages(AI SDK passes in),context.lastStepTotalTokens,session.compactBoundary,context.abortController.signal - Write: None (only overrides this step’s input, doesn’t write context)
Point C · LLM streaming response
The model streams text and tool_use blocks. Doesn’t read per-call state (AI SDK runs internally), but the events produced flow to Point D’s trigger.
For thinking mode, the thinking block’s signature is part of this step’s output — it flows into next step’s messages and next prepareStep sees it. Don’t touch signature fields.
Point D · tool.execute (grep for “auth” files)
tool({
inputSchema: z.object({ pattern: z.string() }),
execute: async (input, { abortSignal, toolCallId }) => {
// Read per-call state
if (abortSignal.aborted) throw new AbortError()
// Read cross-session / per-call
const permission = await checkPermission('grep', input, context)
if (permission.behavior === 'deny') {
return { error: 'permission_denied' }
}
// Actually run
const result = await execFile('rg', [input.pattern, '.'], { signal: abortSignal })
const str = result.stdout
// Possibly write per-call
if (str.length > MAX_RESULT_CHARS) {
const path = await offloadToSandbox(context.sandbox, str)
return { truncated: true, preview: str.slice(0, 1000), offloadPath: path }
}
return { output: str }
},
})
D’s R/W:
- Read:
abortSignal,context(for permission) - Write: Possibly writes to
context.sandbox(offloaded file), possiblycontext.todos(if “auto-add todo”-style tools exist)
Point E · onToolCallFinish (the key cross-over point)
experimental_onToolCallFinish: async (event) => {
if (!event.success) return
const { toolCall, output, abortSignal } = event
// Read: estimation
const tokens = estimateTokens({ input: toolCall.input, output })
if (tokens < PRECOMPACT_TRIGGER_TOKENS) return
// Read: compactCache (idempotence)
if (context.compactCache.has(toolCall.toolCallId)) return
// Execute (one cheap-model LLM call)
const compactResult = await raceAbort(
compactorFor(toolCall.toolName)(toolCall.input, output, context),
abortSignal,
)
if (!compactResult) return
// ←←← **The key write**
context.compactCache.set(toolCall.toolCallId, compactResult)
}
E’s R/W:
- Read:
toolCall,output,context.compactCache(idempotence check),abortSignal - Write:
context.compactCache.set(toolCall.toolCallId, ...)
This write is the critical link between E and B — E caches the compressed result; next-step B reads the cache to substitute. The “Point B again (step 1)” section below is the consumer side.
Point B again (step 1): consume what E wrote
Now step 1 starts (there are step 0’s tool results to handle):
prepareStep: async ({ messages: stepMessages, steps, model }) => {
const isFirstStep = steps.length === 0 // false, now step 1
// Read per-call state (lastStepTotalTokens was updated by onStepFinish)
const currentTokens = context.lastStepTotalTokens
// ↓↓↓ **Key read**: consume what E wrote
const truncated = await truncateOldToolResults(stepMessages, {
compactCache: context.compactCache, // ← reads E's write
})
// Inside truncateTools:
// for (const part of message.parts) {
// if (isToolResultPart(part)) {
// const cached = compactCache.get(part.toolCallId)
// if (cached) {
// part.output = cached.shortOutput // ← replace raw with compact
// }
// }
// }
// Maybe another autocompact (if microcompact isn't enough)
let compacted = truncated
if (currentTokens > AUTOCOMPACT_THRESHOLD) {
compacted = await autocompact(truncated, { signal: context.abortController.signal })
}
// Inject reminders
let final = applyCacheControl(compacted, model)
if (context.reminders.length > 0) {
const text = context.reminders.join('\n')
final = [...final, { role: 'user', content: text }]
context.reminders = [] // ← clear after consumption (avoid repeat injection)
}
return { messages: final }
}
Here E and B are linked:
- E writes:
context.compactCache.set(toolCallId, compactResult)— runs cheap model to compress immediately after tool returns, caches it - B reads: next step’s
prepareStepiterates tool results, on toolCallId hit the cache — replaces raw output with compact version - Net effect: the main agent’s LLM call sees already-compacted tool results, token cost drops sharply, and the compaction LLM call ran on Haiku off the main agent’s critical path
This producer-consumer pair is the core design of Zapvol’s tool-precompact.ts + tiers/tool.ts. Not
understanding this pair means missing why to “precompact in advance”.
B’s R/W at step 1:
- Read:
stepMessages,context.lastStepTotalTokens,context.compactCache,context.reminders,context.abortController.signal - Write:
context.reminders = [](clear after consumption)
Point F · stopWhen
stopWhen: [
stepCountIs(50),
hasToolCall('complete'),
({ steps }) => context.shouldStop,
]
Read: steps, context.shouldStop (external signal)
Write: None
Point G · onFinish
onFinish: async ({ response, usage, finishReason }) => {
// Read per-call
const newMessages = response.messages
const terminalReason = deriveTerminalReason(finishReason, context)
// Write cross-call (session layer)
await db.transaction(async tx => {
await tx.session.appendMessages(session.id, newMessages)
await tx.session.updateCheckpoint(session.id, {
lastStepTotalTokens: context.lastStepTotalTokens, // ← freeze per-call → session
compactedRounds: context.stepCompactor.getCheckpoint(),
})
await tx.session.recordUsage(session.id, usage)
await tx.session.setLastFinishReason(session.id, terminalReason)
})
}
G’s R/W:
- Read:
response.messages,usage,finishReason,context.lastStepTotalTokens,context.stepCompactor(final state) - Write:
session.messages(append),session.compactionCheckpoint(full refresh),session.usage(sum),session.terminalReason
Key: G is the per-call → cross-call state solidification point. Without this step, next call-site can’t read new messages — conversation lost.
Point H · Background tasks
onFinish: async (event) => {
// First sync G's persistence
await persistSession(event, session)
// Then start async background (don't await)
void backgroundMemoryExtraction(event.response.messages, context)
void backgroundResumePreCompact(session.id)
}
H’s R/W:
- Read:
response.messages,context(one-shot snapshot) - Write:
autoMemory(cross-session) — extracted new memory entries - Write:
session.resumePreCompact(cross-call) — pre-computed summary for next resume
After H: context object’s lifetime ends; per-call state fully reclaimed. session / memory layer sediments
for next time.
State R/W matrix
Distilling the walkthrough into a matrix. One glance tells you each state’s producer and consumer.
| State item | A | B | C | D | E | F | G | H |
|---|---|---|---|---|---|---|---|---|
session.messages | R | — | — | — | — | — | W | R |
session.compactionCheckpoint | R | — | — | — | — | — | W | — |
session.usage (cumulative) | — | — | — | — | — | — | W | — |
context.lastStepTotalTokens | W | R | — | — | — | — | R | — |
context.abortController.signal | W | R | — | R | R | — | — | R |
context.compactCache | W (init) | R | — | — | W | — | — | — |
context.reminders | R | RW | — | W | — | — | — | — |
context.todos | R | R | — | RW | — | R | — | — |
session.resumePreCompact | R | — | — | — | — | — | — | W |
autoMemory | R | — | — | — | — | — | — | W |
CLAUDE.md | R | — | — | — | — | — | — | — |
(RW = both read and write; bold W = key producer point)
Three key observations from reading this matrix:
compactCache’s W is in E only, R is in B only — classic producer-consumer; runs solely on this pairabortController.signalis read almost everywhere — concrete expression of “full-chain propagation”session.messages’s W is in G, R is in next A — cross-call handoff depends entirely on G’s persistence
Key producer-consumer pairs (deep-dive on 3)
Pair 1: compactCache (E writes → B reads)
Purpose: compress tool results before the main agent sees them in the next step using a cheap model (Haiku), so the main agent’s critical path doesn’t pay compaction LLM-call latency and cost.
Flow:
tool.execute returns output (D)
↓
onToolCallFinish fires (E)
↓
Check output size > threshold?
↓ yes
Call Haiku: compact(input, output) → { shortOutput, offloadPath }
↓
context.compactCache.set(toolCallId, result)
↓ (next step's prepareStep fires)
prepareStep sees tool_result list (B)
↓
for each tool_result:
if compactCache.has(toolCallId):
replace with compactCache.get(...).shortOutput
↓
return { messages: truncated } overrides this step's input
↓
main agent's LLM call sees the compact version
Easiest-to-mess-up points here:
- Key mismatch: E writes key as
toolCall.toolCallId; B reads aspart.toolCallId— if types are inconsistent (string vs number),Map.getalways misses. - Idempotence: E needs to check “has this toolCallId already been compressed” — otherwise checkpoint replay duplicates the LLM call.
- Race abort: E runs Haiku mid-call, user hits ESC; this compression must immediately stop — otherwise abort happens but background keeps spending money.
Pair 2: lastStepTotalTokens (A seeds → onStepFinish updates → B consumes)
Purpose: compaction threshold judgment (“should autocompact fire”) depends on accurate token estimation. If
lastStepTotalTokens isn’t updated, threshold judgment uses stale values — autocompact either never fires or fires
wrongly.
Flow:
A: context.lastStepTotalTokens = session.compactionCheckpoint.lastStepTotalTokens // inherit from last round
↓
B (step 0): if (lastStepTotalTokens > AUTOCOMPACT_THRESHOLD) → compact
↓
LLM call completes
↓
onStepFinish: context.lastStepTotalTokens = step.usage.inputTokens + step.usage.outputTokens // update
↓
B (step 1): judge again, use updated value
↓ ...
G: session.compactionCheckpoint.lastStepTotalTokens = context.lastStepTotalTokens // freeze for next round
Easiest-to-mess-up points:
- Forgot to write in onStepFinish: autocompact never fires, session runs until context overflow.
- Wrong formula: step.usage has
inputTokens/outputTokens/cachedInputTokens/cacheCreationInputTokens— which represents “next step’s prompt size”? Zapvol’s choice isinputTokens + outputTokens(comment explains why) — wrong choice leads to long-term drift.
Pair 3: session.messages (G writes → next A reads)
Purpose: cross-call conversation persistence. Refresh browser / resume session without losing dialogue.
Flow:
G (this onFinish):
await db.session.appendMessages(session.id, response.messages) // sync
↓ (session ends, context reclaimed)
↓ (user comes back later)
A (next call-site):
const history = await db.session.loadMessages(session.id) // read back
const allMessages = [...history, newUserMessage]
Easiest-to-mess-up points:
- G’s async write without await: user refresh beats db.write — conversation lost. G must sync-wait DB confirm.
- compactBoundary not updated in sync: next A loads messages but doesn’t know which part was compacted —
next prepareStep re-compacts —
session.compactionCheckpointandsession.messagesmust be atomically updated (same transaction). - Thinking block signature broken during serialization: DB-stored JSON field order changed / float precision lost — next submission signature mismatches, API rejects.
8 critical constraints
8 rules of coding discipline on state flow (only #1 is a strict mathematical invariant; the others are operational requirements — breaking any = a class of bug):
| # | Constraint | Breakage symptom |
|---|---|---|
| 1 | session.compactBoundary <= session.messages.length | getMessagesAfterCompactBoundary returns empty / crashes |
| 2 | context.compactCache[toolCallId] key equals message’s tool_use.id | cache always misses, no compression |
| 3 | After abortController.signal.aborted === true, all running tool executes must exit within seconds | user hits ESC but API keeps burning |
| 4 | session.messages and session.compactionCheckpoint must update in same transaction | resume sees checkpoint pointing to non-existent message indices |
| 5 | context.lastStepTotalTokens must update in onStepFinish | autocompact doesn’t fire, context explodes |
| 6 | context.reminders must clear after consumption | every step repeats same reminder, prompt pollution |
| 7 | thinking block signature must preserve byte-by-byte — serialization / deserialization mustn’t touch | next submit “thinking signature mismatch” API error |
| 8 | per-call state must solidify to cross-call layer before onFinish returns | data loss after session ends |
Common state-flow bugs and their symptoms
Production bugs traced back to state violations:
| Symptom | Root cause | Where to fix |
|---|---|---|
| ”Microcompact has no effect, tool results still full-size to LLM” | Pair 1 key mismatch: E uses toolCall.toolCallId, B uses part.tool_call_id (naming inconsistency) | Unify key naming + log cache hit rate |
| ”Autocompact never fires, usage grows to overflow” | Pair 2’s onStepFinish forgotten: context.lastStepTotalTokens is stuck at initial | Explicitly update in onStepFinish |
| ”ESC takes tens of seconds to really stop” | Constraint 3 broken: some tool’s execute doesn’t honor abortSignal | Pass signal through every long-running fetch/spawn |
| ”Conversation history drops last turn after refresh” | Pair 3’s G async write: onFinish didn’t await DB | Change onFinish to async + await |
| ”Resume shows garbled earlier messages” | Constraint 4 broken: messages and checkpoint written separately, one failed | Move to DB transaction |
| ”Same reminder appears 5 times in prompt” | Constraint 6 broken: not cleared after consumption | context.reminders = [] after consumption |
| ”API returns ‘thinking signature mismatch‘“ | Constraint 7 broken: JSON serialization reordered signature fields | Serialize thinking block verbatim, don’t touch |
| ”Resume slow, 5 seconds every time” | H’s resumePreCompact not written: runtime just starts compacting | Async-write summary in H for next-time read |
AbortSignal propagation (called out)
AbortSignal is the easiest state-flow axis to half-implement. Unlike producer-consumer pairs, it’s a broadcast model:
After the user hits ESC, Point A’s abortController.abort() flips signal.aborted to true. Each
observation point then races independently — not layer-by-layer forwarding:
- B (top of next prepareStep):
if (signal.aborted) throw— early exit - C (streaming):
fetch({ signal })— native interruption - D (tool.execute): AI SDK already puts the signal in the second arg; each tool checks itself
- E (onToolCallFinish):
raceAbort(compactLLM, signal)— explicit race - H (background tasks): independently check signal, don’t depend on stream-close notification
Finally AI SDK’s loop detects signal.aborted at its own point → stream ends → G’s
finishReason === 'abort'.
Key: half-assed abort is worse than none — user thinks they cancelled, but API calls still burn. Any new long-running operation (fetch / execFile / spawn / DB call / LLM call) must accept the signal.
3 critical self-audit questions
The 3 questions below cover the most common state-flow pitfalls. If without re-reading the source you can’t answer any one, go back to the corresponding section:
compactCachelifetime: still present after onestreamTextreturns? If the agent spans twostreamTextcalls (same session), can the second call read what the first call’s E wrote?- Forgetting to update
lastStepTotalTokensinonStepFinish: what happens to autocompact? At which step does the problem start? What’s the eventual symptom? - G (onFinish) appended messages but
updateCheckpointDB write failed: what state does the next resume read? What are the cascading effects?
Answers are in the producer-consumer / critical-constraints / bug table sections above — each question points to a specific constraint’s breakage path.
Takeaways for your own agent
Distilling state-flow perspective into directly-actionable items:
- Draw your state buckets first: classify your agent state into per-step / per-call / cross-call / cross-session four buckets. If you can’t classify cleanly, your design has a problem
- Find producer-consumer pairs: every state has at least one producer + one consumer (otherwise it’s dead state). Scan with a matrix
- Per-call → cross-call solidification points must be explicit: Zapvol’s
onFinish’supdateCheckpointis such a point — must be transactional-atomic - AbortSignal threads end-to-end: each observation point races independently; not layer-by-layer forwarding. Half-assed is worse than none
- compactCache key unified: E and B read/write the same cache; key type / naming must match; add logs for hit-rate verification
- Clear after consumption:
reminders/ one-shot flags clear right after reading, or every step repeats injection - lastStepTotalTokens updated in onStepFinish: not updating = autocompact never fires
- Thinking block signature byte-preserved: serialization / deserialization mustn’t touch the signature field
- H’s background tasks independent of stream close: don’t await, but maintain lifecycle yourself (signal / cancellation)
- 8 critical constraints scan across your codebase: each “no” is a potential bug class
Further reading
- Applying to Your Agent (AI SDK) — the hook-by-hook static map, complements this chapter
- Agent Execution Loop — Claude Code’s own state machine (14-field State object)
- Compaction — compactCache / compactionCheckpoint state’s 5-tier background
- Design Lessons — abstract version of all state principles here
- Zapvol reference:
packages/backend/src/agent/agent-round.ts+packages/backend/src/agent/compaction/+packages/backend/src/agent/context/(if present)