Lifecycle State Flow

Agent system authors — the 8 embedding points aren't 8 isolated hooks; they share a state that flows across steps, calls, and sessions. This chapter traces one full conversation's state evolution end-to-end.

Why a second cut

Applying to Your Agent cuts by hook — “what you do in prepareStep / in tool.execute”. That’s the static map.

But real production code isn’t 8 isolated hooks — they share state:

  • onToolCallFinish writes to compactCache; the next prepareStep reads from it
  • onStepFinish writes lastStepTotalTokens; the next prepareStep reads to decide whether to compact
  • tool.execute writes reminders; a later prepareStep injects them into the prompt
  • onFinish persists messages; the next call-site loads them

Without making these producer-consumer pairs explicit, readers step on the seams when implementing.

This chapter is the dynamic walkthrough — tracing how state evolves across the 8 points in one conversation. Complementary to chapter 9’s static map.

Disclaimer: code examples below are a teaching synthesis of CC design + AI SDK best practices — field names differ in detail from Zapvol’s actual implementation. For example, the context.compactCache: new Map() shown below is in Zapvol a file-based cache (.compact.json), and session.compactBoundary is in fact part of session.compactionCheckpoint. This chapter emphasizes state-flow patterns over literal field matching; for concrete Zapvol implementation see packages/backend/src/agent/ source.


Four state buckets: classified by lifetime

All agent state fits into 4 buckets by “how long it lives”:

BucketLifetimeTypical fieldsStorage
Per-stepThis step’s start → endstepNumber · stepMessages · current tool_use block · current LLM response streamAI SDK internal
Per-callstreamText start → onFinish returnstoolUseContext · abortController · compactCache · lastStepTotalTokens · reminders · todosYour context object
Cross-call (in session)Across multiple streamText invocations in one sessionsession.messages · compactBoundary · compactionCheckpoint · cumulative usageSession DB / memory
Cross-sessionPersists across sessionsCLAUDE.md / AGENT.md · user prefs · auto memory · policy rulesFilesystem / DB

The per-call / cross-call boundary is the easiest to mess up — both “span multiple steps” but have different lifetimes:

  • Per-call is shared across steps within one streamText call; after streamText returns, this state should be GC’d
  • Cross-call is persisted after streamText completes; the next call-site reads it back

Confusing them = either state doesn’t persist (next read fails) or memory leaks (per-call state never releases).


Full conversation walkthrough

Let’s trace a real scenario — “user says ‘help me fix the auth bug’, agent runs 3 steps to complete” — following which state each embedding point reads/writes.

Preamble: session restoration

Before the user hits Enter, the session layer already contains:

session.messages = [/* prior rounds' messages */]
session.compactionCheckpoint = { lastStepTotalTokens: 45_000, compactedRounds: {...} }
session.usage = { cumulativeInput: 120_000, cumulativeOutput: 8_000, ... }

Cross-session layer is stable:

CLAUDE.md (filesystem): project rules
~/.claude/CLAUDE.md: user prefs
autoMemory: [feedback_terse, project_migration_freeze, ...]

Point A · Call-site assembly

// New user message + session.messages concat
const allMessages = [...session.messages, { role: 'user', content: userInput }]

// Read cross-session
const claudeMd = await loadLayeredClaudeMd()  // Managed → User → Project → Local
const autoMem = await loadAutoMemoryIndex()   // MEMORY.md index (≤200 lines)
const systemPrompt = assembleSystemPrompt({ claudeMd, autoMem, ...staticParts })

// Write per-call (initialize context)
const context: ZapvolContext = {
  taskId,
  abortController: new AbortController(),  // lives through entire streamText
  lastStepTotalTokens: session.compactionCheckpoint.lastStepTotalTokens,  // ← seed
  compactCache: new Map(),  // empty, waits for E to write
  reminders: [],
  todos: [...],
  // ...
}

// Launch AI SDK
const result = streamText({
  system: systemPrompt,  // byte-stable
  messages: allMessages,
  tools: buildTools(),
  prepareStep: /* see Point B */,
  experimental_onToolCallFinish: createToolPrecompactHook({ context, ... }),
  onStepFinish: step => {
    // Update per-call state
    context.lastStepTotalTokens = deriveFromStep(step)  // ← producer
  },
  onFinish: /* see Point G */,
})

A’s state R/W:

  • Read: session.messages, session.compactionCheckpoint, claudeMd, autoMem (all cross-session / cross-call)
  • Write: context.lastStepTotalTokens (seeds initial value), context.abortController (new instance)

Key: context.lastStepTotalTokens’s initial value comes from session.compactionCheckpoint. This is the cross-call → per-call state handoff — session restore inherits the prior round’s token estimate.

Point B · prepareStep (step 0, first step)

prepareStep: async ({ messages: stepMessages, steps, model }) => {
  const isFirstStep = steps.length === 0  // true

  // Read per-call state
  const currentTokens = context.lastStepTotalTokens  // ← still A's initial value
  const reminders = context.reminders

  // First step: boundary filter + autocompact
  const afterBoundary = getMessagesAfterCompactBoundary(
    stepMessages,
    session.compactBoundary,
  )

  let compacted = afterBoundary
  if (currentTokens > AUTOCOMPACT_THRESHOLD) {
    compacted = await autocompact(afterBoundary, {
      signal: context.abortController.signal,  // ← pass abort
    })
  }

  // Mark cache breakpoints
  const prepared = applyCacheControl(compacted, model)

  return { messages: prepared }
}

B’s R/W at step 0:

  • Read: stepMessages (AI SDK passes in), context.lastStepTotalTokens, session.compactBoundary, context.abortController.signal
  • Write: None (only overrides this step’s input, doesn’t write context)

Point C · LLM streaming response

The model streams text and tool_use blocks. Doesn’t read per-call state (AI SDK runs internally), but the events produced flow to Point D’s trigger.

For thinking mode, the thinking block’s signature is part of this step’s output — it flows into next step’s messages and next prepareStep sees it. Don’t touch signature fields.

Point D · tool.execute (grep for “auth” files)

tool({
  inputSchema: z.object({ pattern: z.string() }),
  execute: async (input, { abortSignal, toolCallId }) => {
    // Read per-call state
    if (abortSignal.aborted) throw new AbortError()

    // Read cross-session / per-call
    const permission = await checkPermission('grep', input, context)
    if (permission.behavior === 'deny') {
      return { error: 'permission_denied' }
    }

    // Actually run
    const result = await execFile('rg', [input.pattern, '.'], { signal: abortSignal })
    const str = result.stdout

    // Possibly write per-call
    if (str.length > MAX_RESULT_CHARS) {
      const path = await offloadToSandbox(context.sandbox, str)
      return { truncated: true, preview: str.slice(0, 1000), offloadPath: path }
    }
    return { output: str }
  },
})

D’s R/W:

  • Read: abortSignal, context (for permission)
  • Write: Possibly writes to context.sandbox (offloaded file), possibly context.todos (if “auto-add todo”-style tools exist)

Point E · onToolCallFinish (the key cross-over point)

experimental_onToolCallFinish: async (event) => {
  if (!event.success) return
  const { toolCall, output, abortSignal } = event

  // Read: estimation
  const tokens = estimateTokens({ input: toolCall.input, output })
  if (tokens < PRECOMPACT_TRIGGER_TOKENS) return

  // Read: compactCache (idempotence)
  if (context.compactCache.has(toolCall.toolCallId)) return

  // Execute (one cheap-model LLM call)
  const compactResult = await raceAbort(
    compactorFor(toolCall.toolName)(toolCall.input, output, context),
    abortSignal,
  )
  if (!compactResult) return

  // ←←← **The key write**
  context.compactCache.set(toolCall.toolCallId, compactResult)
}

E’s R/W:

  • Read: toolCall, output, context.compactCache (idempotence check), abortSignal
  • Write: context.compactCache.set(toolCall.toolCallId, ...)

This write is the critical link between E and B — E caches the compressed result; next-step B reads the cache to substitute. The “Point B again (step 1)” section below is the consumer side.

Point B again (step 1): consume what E wrote

Now step 1 starts (there are step 0’s tool results to handle):

prepareStep: async ({ messages: stepMessages, steps, model }) => {
  const isFirstStep = steps.length === 0  // false, now step 1

  // Read per-call state (lastStepTotalTokens was updated by onStepFinish)
  const currentTokens = context.lastStepTotalTokens

  // ↓↓↓ **Key read**: consume what E wrote
  const truncated = await truncateOldToolResults(stepMessages, {
    compactCache: context.compactCache,  // ← reads E's write
  })

  // Inside truncateTools:
  //   for (const part of message.parts) {
  //     if (isToolResultPart(part)) {
  //       const cached = compactCache.get(part.toolCallId)
  //       if (cached) {
  //         part.output = cached.shortOutput  // ← replace raw with compact
  //       }
  //     }
  //   }

  // Maybe another autocompact (if microcompact isn't enough)
  let compacted = truncated
  if (currentTokens > AUTOCOMPACT_THRESHOLD) {
    compacted = await autocompact(truncated, { signal: context.abortController.signal })
  }

  // Inject reminders
  let final = applyCacheControl(compacted, model)
  if (context.reminders.length > 0) {
    const text = context.reminders.join('\n')
    final = [...final, { role: 'user', content: text }]
    context.reminders = []  // ← clear after consumption (avoid repeat injection)
  }

  return { messages: final }
}

Here E and B are linked:

  1. E writes: context.compactCache.set(toolCallId, compactResult) — runs cheap model to compress immediately after tool returns, caches it
  2. B reads: next step’s prepareStep iterates tool results, on toolCallId hit the cache — replaces raw output with compact version
  3. Net effect: the main agent’s LLM call sees already-compacted tool results, token cost drops sharply, and the compaction LLM call ran on Haiku off the main agent’s critical path

This producer-consumer pair is the core design of Zapvol’s tool-precompact.ts + tiers/tool.ts. Not understanding this pair means missing why to “precompact in advance”.

B’s R/W at step 1:

  • Read: stepMessages, context.lastStepTotalTokens, context.compactCache, context.reminders, context.abortController.signal
  • Write: context.reminders = [] (clear after consumption)

Point F · stopWhen

stopWhen: [
  stepCountIs(50),
  hasToolCall('complete'),
  ({ steps }) => context.shouldStop,
]

Read: steps, context.shouldStop (external signal) Write: None

Point G · onFinish

onFinish: async ({ response, usage, finishReason }) => {
  // Read per-call
  const newMessages = response.messages
  const terminalReason = deriveTerminalReason(finishReason, context)

  // Write cross-call (session layer)
  await db.transaction(async tx => {
    await tx.session.appendMessages(session.id, newMessages)
    await tx.session.updateCheckpoint(session.id, {
      lastStepTotalTokens: context.lastStepTotalTokens,  // ← freeze per-call → session
      compactedRounds: context.stepCompactor.getCheckpoint(),
    })
    await tx.session.recordUsage(session.id, usage)
    await tx.session.setLastFinishReason(session.id, terminalReason)
  })
}

G’s R/W:

  • Read: response.messages, usage, finishReason, context.lastStepTotalTokens, context.stepCompactor (final state)
  • Write: session.messages (append), session.compactionCheckpoint (full refresh), session.usage (sum), session.terminalReason

Key: G is the per-call → cross-call state solidification point. Without this step, next call-site can’t read new messages — conversation lost.

Point H · Background tasks

onFinish: async (event) => {
  // First sync G's persistence
  await persistSession(event, session)

  // Then start async background (don't await)
  void backgroundMemoryExtraction(event.response.messages, context)
  void backgroundResumePreCompact(session.id)
}

H’s R/W:

  • Read: response.messages, context (one-shot snapshot)
  • Write: autoMemory (cross-session) — extracted new memory entries
  • Write: session.resumePreCompact (cross-call) — pre-computed summary for next resume

After H: context object’s lifetime ends; per-call state fully reclaimed. session / memory layer sediments for next time.


State R/W matrix

State Flow Across the 8 Lifecycle Points Each state lives in a specific bucket · indigo arrows mark producer → consumer pairs A call-site B prepareStep C stream D tool.execute E onToolFinish F stopWhen G onFinish H post-call next A next call PER-STEP dies at step end stepMessages AI SDK internal PER-CALL dies when onFinish returns abortSignal W R R R R lastStepTotalTokens W R R ★ compactCache R W reminders R RW W todos R R RW R CROSS-CALL persists across streamText calls session.messages R W R session.checkpoint R W R session.resumeSum W R CROSS-SESSION persists across sessions CLAUDE.md · prefs R R autoMemory R W R Key takeaways 1. ★ compactCache: E is the only producer, next-step B is the only consumer. Key type / naming mismatch = cache always misses 2. lastStepTotalTokens: A seeds → onStepFinish updates each step → B reads for threshold → G freezes to session for next round 3. session.messages: G's persistence is the cross-call handoff. G async-write w/o await = conversation drops on refresh 4. abortController.signal: every observation point races independently — not layer-by-layer forwarding 5. Per-call → cross-call solidification happens in G only. Miss it → session-end data loss

Distilling the walkthrough into a matrix. One glance tells you each state’s producer and consumer.

State itemABCDEFGH
session.messagesRWR
session.compactionCheckpointRW
session.usage (cumulative)W
context.lastStepTotalTokensWRR
context.abortController.signalWRRRR
context.compactCacheW (init)RW
context.remindersRRWW
context.todosRRRWR
session.resumePreCompactRW
autoMemoryRW
CLAUDE.mdR

(RW = both read and write; bold W = key producer point)

Three key observations from reading this matrix:

  1. compactCache’s W is in E only, R is in B only — classic producer-consumer; runs solely on this pair
  2. abortController.signal is read almost everywhere — concrete expression of “full-chain propagation”
  3. session.messages’s W is in G, R is in next A — cross-call handoff depends entirely on G’s persistence

Key producer-consumer pairs (deep-dive on 3)

Pair 1: compactCache (E writes → B reads)

Purpose: compress tool results before the main agent sees them in the next step using a cheap model (Haiku), so the main agent’s critical path doesn’t pay compaction LLM-call latency and cost.

Flow:

tool.execute returns output (D)

onToolCallFinish fires (E)

Check output size > threshold?
  ↓ yes
Call Haiku: compact(input, output) → { shortOutput, offloadPath }

context.compactCache.set(toolCallId, result)
  ↓ (next step's prepareStep fires)
prepareStep sees tool_result list (B)

for each tool_result:
  if compactCache.has(toolCallId):
    replace with compactCache.get(...).shortOutput

return { messages: truncated } overrides this step's input

main agent's LLM call sees the compact version

Easiest-to-mess-up points here:

  1. Key mismatch: E writes key as toolCall.toolCallId; B reads as part.toolCallId — if types are inconsistent (string vs number), Map.get always misses.
  2. Idempotence: E needs to check “has this toolCallId already been compressed” — otherwise checkpoint replay duplicates the LLM call.
  3. Race abort: E runs Haiku mid-call, user hits ESC; this compression must immediately stop — otherwise abort happens but background keeps spending money.

Pair 2: lastStepTotalTokens (A seeds → onStepFinish updates → B consumes)

Purpose: compaction threshold judgment (“should autocompact fire”) depends on accurate token estimation. If lastStepTotalTokens isn’t updated, threshold judgment uses stale values — autocompact either never fires or fires wrongly.

Flow:

A: context.lastStepTotalTokens = session.compactionCheckpoint.lastStepTotalTokens  // inherit from last round

B (step 0): if (lastStepTotalTokens > AUTOCOMPACT_THRESHOLD) → compact

LLM call completes

onStepFinish: context.lastStepTotalTokens = step.usage.inputTokens + step.usage.outputTokens  // update

B (step 1): judge again, use updated value
  ↓ ...
G: session.compactionCheckpoint.lastStepTotalTokens = context.lastStepTotalTokens  // freeze for next round

Easiest-to-mess-up points:

  1. Forgot to write in onStepFinish: autocompact never fires, session runs until context overflow.
  2. Wrong formula: step.usage has inputTokens / outputTokens / cachedInputTokens / cacheCreationInputTokens — which represents “next step’s prompt size”? Zapvol’s choice is inputTokens + outputTokens (comment explains why) — wrong choice leads to long-term drift.

Pair 3: session.messages (G writes → next A reads)

Purpose: cross-call conversation persistence. Refresh browser / resume session without losing dialogue.

Flow:

G (this onFinish):
  await db.session.appendMessages(session.id, response.messages)  // sync
  ↓ (session ends, context reclaimed)
  ↓ (user comes back later)
A (next call-site):
  const history = await db.session.loadMessages(session.id)  // read back
  const allMessages = [...history, newUserMessage]

Easiest-to-mess-up points:

  1. G’s async write without await: user refresh beats db.write — conversation lost. G must sync-wait DB confirm.
  2. compactBoundary not updated in sync: next A loads messages but doesn’t know which part was compacted — next prepareStep re-compacts — session.compactionCheckpoint and session.messages must be atomically updated (same transaction).
  3. Thinking block signature broken during serialization: DB-stored JSON field order changed / float precision lost — next submission signature mismatches, API rejects.

8 critical constraints

8 rules of coding discipline on state flow (only #1 is a strict mathematical invariant; the others are operational requirements — breaking any = a class of bug):

#ConstraintBreakage symptom
1session.compactBoundary <= session.messages.lengthgetMessagesAfterCompactBoundary returns empty / crashes
2context.compactCache[toolCallId] key equals message’s tool_use.idcache always misses, no compression
3After abortController.signal.aborted === true, all running tool executes must exit within secondsuser hits ESC but API keeps burning
4session.messages and session.compactionCheckpoint must update in same transactionresume sees checkpoint pointing to non-existent message indices
5context.lastStepTotalTokens must update in onStepFinishautocompact doesn’t fire, context explodes
6context.reminders must clear after consumptionevery step repeats same reminder, prompt pollution
7thinking block signature must preserve byte-by-byte — serialization / deserialization mustn’t touchnext submit “thinking signature mismatch” API error
8per-call state must solidify to cross-call layer before onFinish returnsdata loss after session ends

Common state-flow bugs and their symptoms

Production bugs traced back to state violations:

SymptomRoot causeWhere to fix
”Microcompact has no effect, tool results still full-size to LLM”Pair 1 key mismatch: E uses toolCall.toolCallId, B uses part.tool_call_id (naming inconsistency)Unify key naming + log cache hit rate
”Autocompact never fires, usage grows to overflow”Pair 2’s onStepFinish forgotten: context.lastStepTotalTokens is stuck at initialExplicitly update in onStepFinish
”ESC takes tens of seconds to really stop”Constraint 3 broken: some tool’s execute doesn’t honor abortSignalPass signal through every long-running fetch/spawn
”Conversation history drops last turn after refresh”Pair 3’s G async write: onFinish didn’t await DBChange onFinish to async + await
”Resume shows garbled earlier messages”Constraint 4 broken: messages and checkpoint written separately, one failedMove to DB transaction
”Same reminder appears 5 times in prompt”Constraint 6 broken: not cleared after consumptioncontext.reminders = [] after consumption
”API returns ‘thinking signature mismatch‘“Constraint 7 broken: JSON serialization reordered signature fieldsSerialize thinking block verbatim, don’t touch
”Resume slow, 5 seconds every time”H’s resumePreCompact not written: runtime just starts compactingAsync-write summary in H for next-time read

AbortSignal propagation (called out)

AbortSignal is the easiest state-flow axis to half-implement. Unlike producer-consumer pairs, it’s a broadcast model:

After the user hits ESC, Point A’s abortController.abort() flips signal.aborted to true. Each observation point then races independentlynot layer-by-layer forwarding:

  • B (top of next prepareStep): if (signal.aborted) throw — early exit
  • C (streaming): fetch({ signal }) — native interruption
  • D (tool.execute): AI SDK already puts the signal in the second arg; each tool checks itself
  • E (onToolCallFinish): raceAbort(compactLLM, signal) — explicit race
  • H (background tasks): independently check signal, don’t depend on stream-close notification

Finally AI SDK’s loop detects signal.aborted at its own point → stream ends → G’s finishReason === 'abort'.

Key: half-assed abort is worse than none — user thinks they cancelled, but API calls still burn. Any new long-running operation (fetch / execFile / spawn / DB call / LLM call) must accept the signal.


3 critical self-audit questions

The 3 questions below cover the most common state-flow pitfalls. If without re-reading the source you can’t answer any one, go back to the corresponding section:

  1. compactCache lifetime: still present after one streamText returns? If the agent spans two streamText calls (same session), can the second call read what the first call’s E wrote?
  2. Forgetting to update lastStepTotalTokens in onStepFinish: what happens to autocompact? At which step does the problem start? What’s the eventual symptom?
  3. G (onFinish) appended messages but updateCheckpoint DB write failed: what state does the next resume read? What are the cascading effects?

Answers are in the producer-consumer / critical-constraints / bug table sections above — each question points to a specific constraint’s breakage path.


Takeaways for your own agent

Distilling state-flow perspective into directly-actionable items:

  1. Draw your state buckets first: classify your agent state into per-step / per-call / cross-call / cross-session four buckets. If you can’t classify cleanly, your design has a problem
  2. Find producer-consumer pairs: every state has at least one producer + one consumer (otherwise it’s dead state). Scan with a matrix
  3. Per-call → cross-call solidification points must be explicit: Zapvol’s onFinish’s updateCheckpoint is such a point — must be transactional-atomic
  4. AbortSignal threads end-to-end: each observation point races independently; not layer-by-layer forwarding. Half-assed is worse than none
  5. compactCache key unified: E and B read/write the same cache; key type / naming must match; add logs for hit-rate verification
  6. Clear after consumption: reminders / one-shot flags clear right after reading, or every step repeats injection
  7. lastStepTotalTokens updated in onStepFinish: not updating = autocompact never fires
  8. Thinking block signature byte-preserved: serialization / deserialization mustn’t touch the signature field
  9. H’s background tasks independent of stream close: don’t await, but maintain lifecycle yourself (signal / cancellation)
  10. 8 critical constraints scan across your codebase: each “no” is a potential bug class

Further reading

  • Applying to Your Agent (AI SDK) — the hook-by-hook static map, complements this chapter
  • Agent Execution Loop — Claude Code’s own state machine (14-field State object)
  • Compaction — compactCache / compactionCheckpoint state’s 5-tier background
  • Design Lessons — abstract version of all state principles here
  • Zapvol reference: packages/backend/src/agent/agent-round.ts + packages/backend/src/agent/compaction/ + packages/backend/src/agent/context/ (if present)
Was this page helpful?