Lifecycle State Flow

Why a second cut

Applying to Your Agent cuts by hook — “what you do in prepareStep / in tool.execute”. That’s the static map.

But real production code isn’t 8 isolated hooks — they share state:

onToolCallFinish writes to compactCache; the next prepareStep reads from it
onStepFinish writes lastStepTotalTokens; the next prepareStep reads to decide whether to compact
tool.execute writes reminders; a later prepareStep injects them into the prompt
onFinish persists messages; the next call-site loads them

Without making these producer-consumer pairs explicit, readers step on the seams when implementing.

This chapter is the dynamic walkthrough — tracing how state evolves across the 8 points in one conversation. Complementary to chapter 9’s static map.

Disclaimer: code examples below are a teaching synthesis of CC design + AI SDK best practices — field names differ in detail from Zapvol’s actual implementation. For example, the context.compactCache: new Map() shown below is in Zapvol a file-based cache (.compact.json), and session.compactBoundary is in fact part of session.compactionCheckpoint. This chapter emphasizes state-flow patterns over literal field matching; for concrete Zapvol implementation see packages/backend/src/agent/ source.

Four state buckets: classified by lifetime

All agent state fits into 4 buckets by “how long it lives”:

Bucket	Lifetime	Typical fields	Storage
Per-step	This step’s start → end	`stepNumber` · `stepMessages` · current tool_use block · current LLM response stream	AI SDK internal
Per-call	streamText start → onFinish returns	`toolUseContext` · `abortController` · `compactCache` · `lastStepTotalTokens` · `reminders` · `todos`	Your context object
Cross-call (in session)	Across multiple streamText invocations in one session	`session.messages` · `compactBoundary` · `compactionCheckpoint` · cumulative usage	Session DB / memory
Cross-session	Persists across sessions	CLAUDE.md / AGENT.md · user prefs · auto memory · policy rules	Filesystem / DB

The per-call / cross-call boundary is the easiest to mess up — both “span multiple steps” but have different lifetimes:

Per-call is shared across steps within one streamText call; after streamText returns, this state should be GC’d
Cross-call is persisted after streamText completes; the next call-site reads it back

Confusing them = either state doesn’t persist (next read fails) or memory leaks (per-call state never releases).

Full conversation walkthrough

Let’s trace a real scenario — “user says ‘help me fix the auth bug’, agent runs 3 steps to complete” — following which state each embedding point reads/writes.

Preamble: session restoration

Before the user hits Enter, the session layer already contains:

session.messages = [/* prior rounds' messages */]
session.compactionCheckpoint = { lastStepTotalTokens: 45_000, compactedRounds: {...} }
session.usage = { cumulativeInput: 120_000, cumulativeOutput: 8_000, ... }

Cross-session layer is stable:

CLAUDE.md (filesystem): project rules
~/.claude/CLAUDE.md: user prefs
autoMemory: [feedback_terse, project_migration_freeze, ...]

Point A · Call-site assembly

// New user message + session.messages concat
const allMessages = [...session.messages, { role: 'user', content: userInput }]

// Read cross-session
const claudeMd = await loadLayeredClaudeMd()  // Managed → User → Project → Local
const autoMem = await loadAutoMemoryIndex()   // MEMORY.md index (≤200 lines)
const systemPrompt = assembleSystemPrompt({ claudeMd, autoMem, ...staticParts })

// Write per-call (initialize context)
const context: ZapvolContext = {
  taskId,
  abortController: new AbortController(),  // lives through entire streamText
  lastStepTotalTokens: session.compactionCheckpoint.lastStepTotalTokens,  // ← seed
  compactCache: new Map(),  // empty, waits for E to write
  reminders: [],
  todos: [...],
  // ...
}

// Launch AI SDK
const result = streamText({
  system: systemPrompt,  // byte-stable
  messages: allMessages,
  tools: buildTools(),
  prepareStep: /* see Point B */,
  experimental_onToolCallFinish: createToolPrecompactHook({ context, ... }),
  onStepFinish: step => {
    // Update per-call state
    context.lastStepTotalTokens = deriveFromStep(step)  // ← producer
  },
  onFinish: /* see Point G */,
})

A’s state R/W:

Read: session.messages, session.compactionCheckpoint, claudeMd, autoMem (all cross-session / cross-call)
Write: context.lastStepTotalTokens (seeds initial value), context.abortController (new instance)

Key: context.lastStepTotalTokens’s initial value comes from session.compactionCheckpoint. This is the cross-call → per-call state handoff — session restore inherits the prior round’s token estimate.

Point B · prepareStep (step 0, first step)

prepareStep: async ({ messages: stepMessages, steps, model }) => {
  const isFirstStep = steps.length === 0  // true

  // Read per-call state
  const currentTokens = context.lastStepTotalTokens  // ← still A's initial value
  const reminders = context.reminders

  // First step: boundary filter + autocompact
  const afterBoundary = getMessagesAfterCompactBoundary(
    stepMessages,
    session.compactBoundary,
  )

  let compacted = afterBoundary
  if (currentTokens > AUTOCOMPACT_THRESHOLD) {
    compacted = await autocompact(afterBoundary, {
      signal: context.abortController.signal,  // ← pass abort
    })
  }

  // Mark cache breakpoints
  const prepared = applyCacheControl(compacted, model)

  return { messages: prepared }
}

B’s R/W at step 0:

Read: stepMessages (AI SDK passes in), context.lastStepTotalTokens, session.compactBoundary, context.abortController.signal
Write: None (only overrides this step’s input, doesn’t write context)

Point C · LLM streaming response

The model streams text and tool_use blocks. Doesn’t read per-call state (AI SDK runs internally), but the events produced flow to Point D’s trigger.

For thinking mode, the thinking block’s signature is part of this step’s output — it flows into next step’s messages and next prepareStep sees it. Don’t touch signature fields.

Point D · tool.execute (grep for “auth” files)

tool({
  inputSchema: z.object({ pattern: z.string() }),
  execute: async (input, { abortSignal, toolCallId }) => {
    // Read per-call state
    if (abortSignal.aborted) throw new AbortError()

    // Read cross-session / per-call
    const permission = await checkPermission('grep', input, context)
    if (permission.behavior === 'deny') {
      return { error: 'permission_denied' }
    }

    // Actually run
    const result = await execFile('rg', [input.pattern, '.'], { signal: abortSignal })
    const str = result.stdout

    // Possibly write per-call
    if (str.length > MAX_RESULT_CHARS) {
      const path = await offloadToSandbox(context.sandbox, str)
      return { truncated: true, preview: str.slice(0, 1000), offloadPath: path }
    }
    return { output: str }
  },
})

D’s R/W:

Read: abortSignal, context (for permission)
Write: Possibly writes to context.sandbox (offloaded file), possibly context.todos (if “auto-add todo”-style tools exist)

Point E · onToolCallFinish (the key cross-over point)

experimental_onToolCallFinish: async (event) => {
  if (!event.success) return
  const { toolCall, output, abortSignal } = event

  // Read: estimation
  const tokens = estimateTokens({ input: toolCall.input, output })
  if (tokens < PRECOMPACT_TRIGGER_TOKENS) return

  // Read: compactCache (idempotence)
  if (context.compactCache.has(toolCall.toolCallId)) return

  // Execute (one cheap-model LLM call)
  const compactResult = await raceAbort(
    compactorFor(toolCall.toolName)(toolCall.input, output, context),
    abortSignal,
  )
  if (!compactResult) return

  // ←←← **The key write**
  context.compactCache.set(toolCall.toolCallId, compactResult)
}

E’s R/W:

Read: toolCall, output, context.compactCache (idempotence check), abortSignal
Write: context.compactCache.set(toolCall.toolCallId, ...)

This write is the critical link between E and B — E caches the compressed result; next-step B reads the cache to substitute. The “Point B again (step 1)” section below is the consumer side.

Point B again (step 1): consume what E wrote

Now step 1 starts (there are step 0’s tool results to handle):

prepareStep: async ({ messages: stepMessages, steps, model }) => {
  const isFirstStep = steps.length === 0  // false, now step 1

  // Read per-call state (lastStepTotalTokens was updated by onStepFinish)
  const currentTokens = context.lastStepTotalTokens

  // ↓↓↓ **Key read**: consume what E wrote
  const truncated = await truncateOldToolResults(stepMessages, {
    compactCache: context.compactCache,  // ← reads E's write
  })

  // Inside truncateTools:
  //   for (const part of message.parts) {
  //     if (isToolResultPart(part)) {
  //       const cached = compactCache.get(part.toolCallId)
  //       if (cached) {
  //         part.output = cached.shortOutput  // ← replace raw with compact
  //       }
  //     }
  //   }

  // Maybe another autocompact (if microcompact isn't enough)
  let compacted = truncated
  if (currentTokens > AUTOCOMPACT_THRESHOLD) {
    compacted = await autocompact(truncated, { signal: context.abortController.signal })
  }

  // Inject reminders
  let final = applyCacheControl(compacted, model)
  if (context.reminders.length > 0) {
    const text = context.reminders.join('\n')
    final = [...final, { role: 'user', content: text }]
    context.reminders = []  // ← clear after consumption (avoid repeat injection)
  }

  return { messages: final }
}

Here E and B are linked:

E writes: context.compactCache.set(toolCallId, compactResult) — runs cheap model to compress immediately after tool returns, caches it
B reads: next step’s prepareStep iterates tool results, on toolCallId hit the cache — replaces raw output with compact version
Net effect: the main agent’s LLM call sees already-compacted tool results, token cost drops sharply, and the compaction LLM call ran on Haiku off the main agent’s critical path

This producer-consumer pair is the core design of Zapvol’s tool-precompact.ts + tiers/tool.ts. Not understanding this pair means missing why to “precompact in advance”.

B’s R/W at step 1:

Read: stepMessages, context.lastStepTotalTokens, context.compactCache, context.reminders, context.abortController.signal
Write: context.reminders = [] (clear after consumption)

Point F · stopWhen

stopWhen: [
  stepCountIs(50),
  hasToolCall('complete'),
  ({ steps }) => context.shouldStop,
]

Read: steps, context.shouldStop (external signal) Write: None

Point G · onFinish

onFinish: async ({ response, usage, finishReason }) => {
  // Read per-call
  const newMessages = response.messages
  const terminalReason = deriveTerminalReason(finishReason, context)

  // Write cross-call (session layer)
  await db.transaction(async tx => {
    await tx.session.appendMessages(session.id, newMessages)
    await tx.session.updateCheckpoint(session.id, {
      lastStepTotalTokens: context.lastStepTotalTokens,  // ← freeze per-call → session
      compactedRounds: context.stepCompactor.getCheckpoint(),
    })
    await tx.session.recordUsage(session.id, usage)
    await tx.session.setLastFinishReason(session.id, terminalReason)
  })
}

G’s R/W:

Read: response.messages, usage, finishReason, context.lastStepTotalTokens, context.stepCompactor (final state)
Write: session.messages (append), session.compactionCheckpoint (full refresh), session.usage (sum), session.terminalReason

Key: G is the per-call → cross-call state solidification point. Without this step, next call-site can’t read new messages — conversation lost.

Point H · Background tasks

onFinish: async (event) => {
  // First sync G's persistence
  await persistSession(event, session)

  // Then start async background (don't await)
  void backgroundMemoryExtraction(event.response.messages, context)
  void backgroundResumePreCompact(session.id)
}

H’s R/W:

Read: response.messages, context (one-shot snapshot)
Write: autoMemory (cross-session) — extracted new memory entries
Write: session.resumePreCompact (cross-call) — pre-computed summary for next resume

After H: context object’s lifetime ends; per-call state fully reclaimed. session / memory layer sediments for next time.

State R/W matrix

Distilling the walkthrough into a matrix. One glance tells you each state’s producer and consumer.

State item	A	B	C	D	E	F	G	H
`session.messages`	R	—	—	—	—	—	W	R
`session.compactionCheckpoint`	R	—	—	—	—	—	W	—
`session.usage` (cumulative)	—	—	—	—	—	—	W	—
`context.lastStepTotalTokens`	W	R	—	—	—	—	R	—
`context.abortController.signal`	W	R	—	R	R	—	—	R
`context.compactCache`	W (init)	R	—	—	W	—	—	—
`context.reminders`	R	RW	—	W	—	—	—	—
`context.todos`	R	R	—	RW	—	R	—	—
`session.resumePreCompact`	R	—	—	—	—	—	—	W
`autoMemory`	R	—	—	—	—	—	—	W
`CLAUDE.md`	R	—	—	—	—	—	—	—

(RW = both read and write; bold W = key producer point)

Three key observations from reading this matrix:

compactCache’s W is in E only, R is in B only — classic producer-consumer; runs solely on this pair
abortController.signal is read almost everywhere — concrete expression of “full-chain propagation”
session.messages’s W is in G, R is in next A — cross-call handoff depends entirely on G’s persistence

Key producer-consumer pairs (deep-dive on 3)

Pair 1: `compactCache` (E writes → B reads)

Purpose: compress tool results before the main agent sees them in the next step using a cheap model (Haiku), so the main agent’s critical path doesn’t pay compaction LLM-call latency and cost.

Flow:

tool.execute returns output (D)
  ↓
onToolCallFinish fires (E)
  ↓
Check output size > threshold?
  ↓ yes
Call Haiku: compact(input, output) → { shortOutput, offloadPath }
  ↓
context.compactCache.set(toolCallId, result)
  ↓ (next step's prepareStep fires)
prepareStep sees tool_result list (B)
  ↓
for each tool_result:
  if compactCache.has(toolCallId):
    replace with compactCache.get(...).shortOutput
  ↓
return { messages: truncated } overrides this step's input
  ↓
main agent's LLM call sees the compact version

Easiest-to-mess-up points here:

Key mismatch: E writes key as toolCall.toolCallId; B reads as part.toolCallId — if types are inconsistent (string vs number), Map.get always misses.
Idempotence: E needs to check “has this toolCallId already been compressed” — otherwise checkpoint replay duplicates the LLM call.
Race abort: E runs Haiku mid-call, user hits ESC; this compression must immediately stop — otherwise abort happens but background keeps spending money.

Pair 2: `lastStepTotalTokens` (A seeds → onStepFinish updates → B consumes)

Purpose: compaction threshold judgment (“should autocompact fire”) depends on accurate token estimation. If lastStepTotalTokens isn’t updated, threshold judgment uses stale values — autocompact either never fires or fires wrongly.

Flow:

A: context.lastStepTotalTokens = session.compactionCheckpoint.lastStepTotalTokens  // inherit from last round
  ↓
B (step 0): if (lastStepTotalTokens > AUTOCOMPACT_THRESHOLD) → compact
  ↓
LLM call completes
  ↓
onStepFinish: context.lastStepTotalTokens = step.usage.inputTokens + step.usage.outputTokens  // update
  ↓
B (step 1): judge again, use updated value
  ↓ ...
G: session.compactionCheckpoint.lastStepTotalTokens = context.lastStepTotalTokens  // freeze for next round

Easiest-to-mess-up points:

Forgot to write in onStepFinish: autocompact never fires, session runs until context overflow.
Wrong formula: step.usage has inputTokens / outputTokens / cachedInputTokens / cacheCreationInputTokens — which represents “next step’s prompt size”? Zapvol’s choice is inputTokens + outputTokens (comment explains why) — wrong choice leads to long-term drift.

Pair 3: `session.messages` (G writes → next A reads)

Purpose: cross-call conversation persistence. Refresh browser / resume session without losing dialogue.

Flow:

G (this onFinish):
  await db.session.appendMessages(session.id, response.messages)  // sync
  ↓ (session ends, context reclaimed)
  ↓ (user comes back later)
A (next call-site):
  const history = await db.session.loadMessages(session.id)  // read back
  const allMessages = [...history, newUserMessage]

Easiest-to-mess-up points:

G’s async write without await: user refresh beats db.write — conversation lost. G must sync-wait DB confirm.
compactBoundary not updated in sync: next A loads messages but doesn’t know which part was compacted — next prepareStep re-compacts — session.compactionCheckpoint and session.messages must be atomically updated (same transaction).
Thinking block signature broken during serialization: DB-stored JSON field order changed / float precision lost — next submission signature mismatches, API rejects.

8 critical constraints

8 rules of coding discipline on state flow (only #1 is a strict mathematical invariant; the others are operational requirements — breaking any = a class of bug):

#	Constraint	Breakage symptom
1	`session.compactBoundary <= session.messages.length`	`getMessagesAfterCompactBoundary` returns empty / crashes
2	`context.compactCache[toolCallId]` key equals message’s tool_use.id	cache always misses, no compression
3	After `abortController.signal.aborted === true`, all running tool executes must exit within seconds	user hits ESC but API keeps burning
4	`session.messages` and `session.compactionCheckpoint` must update in same transaction	resume sees checkpoint pointing to non-existent message indices
5	`context.lastStepTotalTokens` must update in `onStepFinish`	autocompact doesn’t fire, context explodes
6	`context.reminders` must clear after consumption	every step repeats same reminder, prompt pollution
7	thinking block signature must preserve byte-by-byte — serialization / deserialization mustn’t touch	next submit “thinking signature mismatch” API error
8	`per-call` state must solidify to `cross-call` layer before `onFinish` returns	data loss after session ends

Common state-flow bugs and their symptoms

Production bugs traced back to state violations:

Symptom	Root cause	Where to fix
”Microcompact has no effect, tool results still full-size to LLM”	Pair 1 key mismatch: E uses `toolCall.toolCallId`, B uses `part.tool_call_id` (naming inconsistency)	Unify key naming + log cache hit rate
”Autocompact never fires, usage grows to overflow”	Pair 2’s onStepFinish forgotten: `context.lastStepTotalTokens` is stuck at initial	Explicitly update in onStepFinish
”ESC takes tens of seconds to really stop”	Constraint 3 broken: some tool’s execute doesn’t honor abortSignal	Pass signal through every long-running `fetch`/`spawn`
”Conversation history drops last turn after refresh”	Pair 3’s G async write: `onFinish` didn’t await DB	Change `onFinish` to async + await
”Resume shows garbled earlier messages”	Constraint 4 broken: messages and checkpoint written separately, one failed	Move to DB transaction
”Same reminder appears 5 times in prompt”	Constraint 6 broken: not cleared after consumption	`context.reminders = []` after consumption
”API returns ‘thinking signature mismatch‘“	Constraint 7 broken: JSON serialization reordered signature fields	Serialize thinking block verbatim, don’t touch
”Resume slow, 5 seconds every time”	H’s `resumePreCompact` not written: runtime just starts compacting	Async-write summary in H for next-time read

AbortSignal propagation (called out)

AbortSignal is the easiest state-flow axis to half-implement. Unlike producer-consumer pairs, it’s a broadcast model:

After the user hits ESC, Point A’s abortController.abort() flips signal.aborted to true. Each observation point then races independently — not layer-by-layer forwarding:

B (top of next prepareStep): if (signal.aborted) throw — early exit
C (streaming): fetch({ signal }) — native interruption
D (tool.execute): AI SDK already puts the signal in the second arg; each tool checks itself
E (onToolCallFinish): raceAbort(compactLLM, signal) — explicit race
H (background tasks): independently check signal, don’t depend on stream-close notification

Finally AI SDK’s loop detects signal.aborted at its own point → stream ends → G’s finishReason === 'abort'.

Key: half-assed abort is worse than none — user thinks they cancelled, but API calls still burn. Any new long-running operation (fetch / execFile / spawn / DB call / LLM call) must accept the signal.

3 critical self-audit questions

The 3 questions below cover the most common state-flow pitfalls. If without re-reading the source you can’t answer any one, go back to the corresponding section:

compactCache lifetime: still present after one streamText returns? If the agent spans two streamText calls (same session), can the second call read what the first call’s E wrote?
Forgetting to update lastStepTotalTokens in onStepFinish: what happens to autocompact? At which step does the problem start? What’s the eventual symptom?
G (onFinish) appended messages but updateCheckpoint DB write failed: what state does the next resume read? What are the cascading effects?

Answers are in the producer-consumer / critical-constraints / bug table sections above — each question points to a specific constraint’s breakage path.

Takeaways for your own agent

Distilling state-flow perspective into directly-actionable items:

Draw your state buckets first: classify your agent state into per-step / per-call / cross-call / cross-session four buckets. If you can’t classify cleanly, your design has a problem
Find producer-consumer pairs: every state has at least one producer + one consumer (otherwise it’s dead state). Scan with a matrix
Per-call → cross-call solidification points must be explicit: Zapvol’s onFinish’s updateCheckpoint is such a point — must be transactional-atomic
AbortSignal threads end-to-end: each observation point races independently; not layer-by-layer forwarding. Half-assed is worse than none
compactCache key unified: E and B read/write the same cache; key type / naming must match; add logs for hit-rate verification
Clear after consumption: reminders / one-shot flags clear right after reading, or every step repeats injection
lastStepTotalTokens updated in onStepFinish: not updating = autocompact never fires
Thinking block signature byte-preserved: serialization / deserialization mustn’t touch the signature field
H’s background tasks independent of stream close: don’t await, but maintain lifecycle yourself (signal / cancellation)
8 critical constraints scan across your codebase: each “no” is a potential bug class

Why a second cut

Four state buckets: classified by lifetime

Full conversation walkthrough

Preamble: session restoration

Point A · Call-site assembly

Point B · prepareStep (step 0, first step)

Point C · LLM streaming response

Point D · tool.execute (grep for “auth” files)

Point E · onToolCallFinish (the key cross-over point)

Point B again (step 1): consume what E wrote

Point F · stopWhen

Point G · onFinish

Point H · Background tasks

State R/W matrix

Key producer-consumer pairs (deep-dive on 3)

Pair 1: compactCache (E writes → B reads)

Pair 2: lastStepTotalTokens (A seeds → onStepFinish updates → B consumes)

Pair 3: session.messages (G writes → next A reads)

8 critical constraints

Common state-flow bugs and their symptoms

AbortSignal propagation (called out)

3 critical self-audit questions

Takeaways for your own agent

Further reading

Pair 1: `compactCache` (E writes → B reads)

Pair 2: `lastStepTotalTokens` (A seeds → onStepFinish updates → B consumes)

Pair 3: `session.messages` (G writes → next A reads)