Applying to Your Agent (AI SDK)

Agent system authors (AI SDK users) — when building an agent with AI SDK, where each prior-chapter principle lands in the 8 embedding points. Uses Zapvol as the evaluated case with current-state assessment and evolution roadmap.

Chapter framing

Prior chapters covered what Claude Code does. This chapter’s goal is different:

  • For external readers: guidance on landing CC’s principles into specific AI SDK hook lines
  • For the Zapvol team: using ourselves as a concrete case — current state assessment + evolution direction

Both goals served by the same structure — three parts per embedding point:

  1. Best practice (what AI SDK + CC teach us)
  2. Zapvol current state (Pass / Partial / Missing, cited from source)
  3. Evolution direction (concrete deliverables + priority)

This is not a “look how great Zapvol is” showcase — it’s a roadmap that admits gaps and points forward.

Reading assumption: AI SDK v6 (ai@^6), familiar with streamText / generateText / prepareStep / experimental_onToolCallFinish / stopWhen.


Layer 1: what to copy, what to discard

Not every CC pattern is worth copying. Getting the boundary wrong leads to “I also built a git worktree thing” — mechanism-level mimicry.

CC patternTransferabilityWhy
Worktree / CCR / Bash sandboxCoding onlyMechanism depends on git / Anthropic SaaS / POSIX
CLAUDE.md directory walkPartialAbstract as “layered prompt injection points” and it transfers
Git status / File toolsCoding onlyTool selection
Async generator main loopUniversalBaseline for streaming UI
Terminal reason enumUniversalDifferent terminations → different UX
System prompt static/dynamic boundaryUniversalCache hit rate hard red line
Compaction cascade (cheap → expensive)Universal principleN tiers is your choice
MEMORY.md index patternUniversalOnly scale path for unbounded storage
Multi-axis memory taxonomyUniversalWho writes × what stores × how fast ages
Subagent as context firewallUniversalAny agent with ReAct loops needs this
Hook vs prompt layeringUniversalProcess-level can’t rely on prompts
Tombstone streaming retractionUniversalAny streaming UI
Circuit breakerUniversalAny system with retries
Data-driven prompt engineeringUniversal (meta-principle)Copy the culture, not the numbers

Core: mechanisms aren’t universal, principles are. Worktree’s mechanism is useless for non-coding agents; but the role it plays (“disposable isolation workspace”) transfers to any agent needing isolated trial-and-error.


The 8 AI SDK embedding points

AI SDK Lifecycle · 8 Embedding Points for CC Principles Where to put each principle in streamText() lifecycle. Left: AI SDK hook · Right: mapped CC principle AI SDK LIFECYCLE CLAUDE CODE PRINCIPLE · ATTENTION POINT streamText({ ... }) A Call-site assembly system / messages / tools Late binding for cache · 5-layer priority Keep `system` byte-stable · push dynamic context into messages · mark providerOptions.cacheControl breakpoints explicitly B prepareStep per-step input override 14-step pipeline compresses HERE · boundary / budget / snip / microcompact / collapse / autocompact First step vs later split · partial returns only · thread abortSignal manually · blocking check at top C Streaming consume text / tool-call events Tombstone retraction · thinking block signature preservation · usage tracking by component Streaming output can't be client-retracted · need explicit void marker for failed streams D tool.execute where tools run Permission check · PreToolUse hook · updatedInput · output truncation · error-as-prompt Filter via prepareStep is WRONG — deny in execute so the model gets structured feedback · thread abortSignal to leaves E onToolCallFinish experimental hook Tier 1 microcompact entry · precompact large tool outputs with cheap model Race abortSignal · idempotent via cache · use Haiku not Opus · write cache, next prepareStep reads it F stopWhen continue or exit Terminal reason classification · combine stopWhen: stepCountIs + hasToolCall('complete') + custom finishReason too coarse — maintain your own TerminalReason type · provide explicit complete tool G Stream done onFinish callback Session persistence · usage aggregation · finishReason classification Persist result.response.messages — no persistence means no resume · record all 4 usage components H Post-call bg tasks after onFinish returns PostCompact · memory extraction · background resume pre-compaction Async (don't await on stream close) · resume UX depends on exit-time pre-computation not runtime MVP priority Nail A / D / F / G first (80% value) · B added in stage 2 · C / E / H added in stage 3. Don't max out for completeness. Reference impl: packages/backend/src/agent/agent-round.ts (Zapvol's production pattern) Each row's right side is "the principle + where attention is needed"; see chapter text for code examples.

AI SDK v6’s streamText hides the loop internally, exposing only a few hooks. All CC patterns must fit into these points:

PointLocationRole
ACall-site assemblysystem / messages / tools / providerOptions
BprepareStepPer-step input preprocessing (compaction / budget / cache control)
CStreaming consumptionConsume stream events
Dtool.executeWhere tools actually run (permission / hook / sandbox)
Eexperimental_onToolCallFinishAfter tool completes (microcompact entry)
FstopWhenContinue or stop
GonFinishAggregate usage / persist messages
HPost-callSession snapshot / background tasks

Below: each point in the three-part structure.


Point A · Call-site assembly

Best practice

  1. system byte-stable: don’t concat Date.now() / userName / gitStatus into the system string — one char change invalidates the entire prompt cache
  2. Dynamic context in the messages layer: CC places currentDate in auto memory, not system
  3. Explicit providerOptions.cacheControl breakpoints: AI SDK won’t add them for you
  4. Tool description is prompt engineering: write each one teaching “what the agent should do next” (see CC’s Edit tool)
  5. maxSteps with a ceiling but not too small: 30-50 is a reasonable daily range

Zapvol current state

Partial · one potential cache risk needs audit

  • Pass · Tool descriptions are generally written tutorial-style (see filesystem.compact.ts, browser.tool.ts), copying CC’s approach
  • Pass · Explicit providerOptions cache control: applyCacheControl(compacted, model, { extraBreakpointAt }) called every prepareStep
  • Partial · appendSystemContext(systemPrompt, systemContext) (agent-round.ts) concatenates dynamic systemContext to system’s tail — if systemContext contains any per-turn-changing fields (currentDate / gitStatus / timestamps), the entire system hashes differently every turn, all prior cache breakpoints invalidated. Worth auditing (not listed in .claude/design/compaction-redesign.md but significant)
  • Partial · BUDGET_RATIO = 0.2 (compaction/config.ts:15) vs doc stating 0.8 vs CC’s ~93% — under review (redesign §2), not Point A specifically but affects threshold discussions

Evolution direction

P0 · Audit appendSystemContext content composition

Add a logging line to verify cache stability:

// Before applyCacheControl
log.info("cache.system_hash", {
  hash: createHash("sha256").update(fullSystemPrompt).digest("hex").slice(0, 16),
  step: steps.length,
})

Run a 10-step task and inspect the hash:

  • Stable → Pass current design is fine
  • Changes each step → Missing must refactor: move dynamic parts (currentDate / ctx info) from systemContext to the head of messages as a user message

P1 · Layer systemContext contents by stability

Reference CC’s getSystemContext() + getUserContext() separation (see System Prompt Assembly). Each layer memoized — computed once at session start.


Point B · prepareStep (where most compaction logic lives)

Best practice

  1. Split by step: step 0 can do heavy lifting (boundary filter + autocompact); later steps only need microcompact — running autocompact every step blows up cost
  2. Partial return: only write { messages }, not { messages, system, tools, toolChoice } — unchanged fields shouldn’t be written to avoid accidental cache breaks
  3. Pass AbortSignal manually to compaction LLM: prepareStep’s signature doesn’t include abortSignal; read from ctx
  4. Blocking check at the top: on hard ceiling return synthetic error for graceful exit, don’t let API throw 413
  5. Hysteresis: don’t trigger activateMoreCrCandidates when over budget by ≤5% — preserves cache stability

Zapvol current state

Partial · functions there but lacking several fine-grained controls

  • Pass · stepCompactor.apply uniformly called (agent-round.ts:183) — every step goes through the full compaction pipeline
  • Pass · abortSignal propagation: context.abortController.signal read from context, passed into autocompact / summarizer LLM calls
  • Pass · Cache breakpoints explicitly marked: applyCacheControl + extraBreakpointAt = compactedPrefixEnd
  • Missing · No step 0 vs step 1+ differentiation: stepCompactor.apply runs the same logic every step — may over-compact on later steps
  • Missing · No hysteresis: over-budget by 1 token triggers activateMoreCrCandidates (redesign §P2 #6 lists as todo)
  • Missing · No blocking synthesis fallback: only reacts to 413 (but reactive itself is missing, see below)
  • Missing · No 413 reactive handler: redesign §P1 #4 explicitly lists as gap

Evolution direction

P0 · Add step 0 vs step 1+ differentiation

Currently stepCompactor treats first and subsequent steps identically. Suggested:

// packages/backend/src/agent/compaction/step-compactor.ts
export async function apply(
  messages: UIMessage[],
  options: StepCompactorOptions,
): Promise<CompactResult> {
  const isFirstStep = options.stepNumber === 0  // need passing from prepareStep

  if (isFirstStep) {
    // First step: boundary already handled by prepareInitialMessages; only check autocompact need
    return await maybeAutocompact(messages, options)
  }

  // Later steps: prioritize reading precompact cache, avoid re-running LLM summarize
  const truncated = await truncateOldToolResults(messages, options)
  if (getLastStepTotalTokens(options) > budget) {
    return await maybeAutocompact(truncated, options)  // still over → autocompact
  }
  return { messages: truncated, compactedPrefixEnd: ... }
}

Gain: save one LLM call on later steps when microcompact suffices.

P1 · Hysteresis + blocking synthesis fallback

Per redesign §P2 #6: no activateMoreCrCandidates when over budget by ≤5%.

Per CC’s blocking check (see compaction chapter): on hard ceiling:

if (currentTokens >= HARD_CEILING) {
  yield synthetic error message
  return { reason: 'blocking_limit' }
}

A synthetic-message graceful exit pattern — cleaner than API throwing 413.

P1 · 413 reactive handler

Redesign §P1 #4 already listed. On 413 force activateMoreCrCandidates + retry once.


Point C · Streaming consumption

Best practice

  1. Tombstone retraction: already-streamed chunks voided due to streaming fallback need explicit void-markers — UI can’t retract characters from the client
  2. Thinking block signature preservation: don’t touch signature fields on cross-turn reuse; serialization / deserialization must preserve byte-alignment
  3. Usage tracked by component: inputTokens / outputTokens / cachedInputTokens / cacheCreationInputTokens separately — cache hit rate is core observability metric

Zapvol current state

Partial · streaming retraction missing; thinking signature partially handled

  • Pass · UI message stream uses AI SDK v6’s ui-message-stream protocol (agent-ui-stream.ts)
  • Pass · Thinking block filtering: round-precompact.ts:249 explicitly strips reasoning/thinking blocks before sending to the compaction model — thinking carries the original model’s signature which would fail when forwarded to a different model. This is done better than most agents
  • Partial · Cross-turn thinking reuse: no explicit signature-field protection during serialization — if messages persist to DB and come back with JSON field order changed, next submission gets API-rejected
  • Missing · No tombstone / streaming retraction: UI side lacks explicit “streamed but void” markers
  • Partial · Usage recording: onStepFinish accumulates but stores only totalTokens (agent-round.ts:247) — loses cache read / cache creation, two key components

Evolution direction

P1 · Usage component-wise recording

Refactor onStepFinish:

onStepFinish: (step) => {
  context.lastStepTotalTokens = step.usage.totalTokens  // current usage
  // New:
  session.recordStepUsage({
    input: step.usage.inputTokens,
    output: step.usage.outputTokens,
    cacheRead: step.usage.cachedInputTokens,         // ← key
    cacheCreate: step.usage.cacheCreationInputTokens,  // ← key
  })
}

Without component-level tracking, cache hit rate is unobservable — any cache optimization work can’t be verified.

P2 · Streaming retraction primitive

If production encounters “streaming fallback leaving residual UI text” bugs, add tombstone:

// agent-ui-stream.ts add a data part type
context.writeTransient(DataPartEvent.TOMBSTONE, { messageId, reason: "streaming_fallback" })

Frontend removes the rendered region on receipt. Not needed if not currently an issue.

P2 · Thinking signature persistence protection

When Zapvol eventually does resume with extended thinking enabled — serialization of messages must preserve providerOptions.anthropic.signature field verbatim (no JSON field-order normalization, no base64 padding changes). Not blocking now, handle when doing extended thinking.


Point D · tool.execute (permissions + hooks physical location)

Best practice

  1. Permission check at top of execute: don’t filter tool list in prepareStep (model doesn’t know it was denied, just reports schema mismatch)
  2. PreToolUse hook equivalent: wrapTool higher-order function, supporting updatedInput (modify args, not just allow/deny)
  3. AbortSignal to leaves: fetch / execFile / DB query all need signal — half-done abort is worse than none
  4. Output truncation + offload: single tool output can’t consume the entire context
  5. Error text is prompt: teach the model what to do next

Zapvol current state

Pass · nearly complete — D is Zapvol’s strongest point

  • Pass · Complete permission system: utils/permissions/ (mode + rule + classifier + hook levels)
  • Pass · abortSignal full chain: every tool execute accepts signal, passed into sandbox command execution
  • Pass · maxResultSizeChars concept exists (TOOL_TRUNCATE_CHARS = 1000)
  • Pass · ServerToolConfig wrapper: each tool has compact / requiredPermission / renderMessage extension fields, compiles to AI SDK tool (exactly the pattern recommended in “Architectural Decisions” below)
  • Pass · Error text: most tools have tutorial-style errors (see browser.tool.ts error codes)
  • Partial · PreToolUse hook equivalent: permission checks exist but no “hook modifies arguments” — updatedInput capability currently absent

Evolution direction

P2 · Add tool wrapper pre/post/error hooks

Currently permissions are done tool-by-tool manually. If future needs like “auto-add lint / auto-inject ctx”:

function wrapTool<I, O>(cfg: ServerToolConfig<I, O>, hooks: ToolHooks<I, O>): AiSdkTool {
  return toAiSdkTool({
    ...cfg,
    execute: async (input, ctx) => {
      const pre = await hooks.preToolUse?.(input, ctx)
      if (pre?.decision === "block") return { error: pre.reason }
      const actualInput = pre?.updatedInput ?? input
      // run original execute ...
    },
  })
}

Not in redesign, but CC’s PreToolUse hook with updatedInput is valuable for extensibility. Low priority (no blocking need today), but worth leaving space for architecturally.


Point E · experimental_onToolCallFinish (microcompact entry)

Best practice

  1. Race abort: precompact is a separate LLM call; user ESC must stop it immediately
  2. Idempotent: same toolCallId replay short-circuits on cache
  3. Cheap model: main agent on Opus/Sonnet, precompact on Haiku — 1/10 cost
  4. Offload + cache: compressed result + original text offloaded, main agent can read_file if needed

Zapvol current state

Pass · reference-quality — Zapvol’s tool-precompact.ts nearly fully aligns with CC’s Tier 1 microcompact

  • Pass · Race abort: explicit raceAbort(compactor(...), abortSignal)
  • Pass · Idempotent: readCompactCache(toolCallId) check first
  • Pass · Offload + cache: offloadToolData to disk + writeCompactCache stores result, file-based cross-step persistence
  • Pass · Per-tool compactor: ServerToolConfig.compact() lets each tool define its own compression strategy (more refined than CC microcompact’s uniform placeholder)
  • Partial · PRECOMPACT_TRIGGER_TOKENS = 2500: redesign §P1 #3 suggests lowering to 500-1000 — more small tools compressed at finish time

Evolution direction

P1 · Lower PRECOMPACT_TRIGGER_TOKENS

Per redesign §P1 #3: 2500 → 500-1000. Gain: more tools compressed at the cache-safe moment, reducing mid-round prepareStep compaction spikes.

For external readers: look at tool-precompact.ts design — it’s a production-grade reference for microcompact, worth copying.


Point F · stopWhen

Best practice

  1. Combine multiple conditions: stepCountIs(N) + hasToolCall('complete') + custom
  2. Provide an explicit complete tool: let the model “explicitly declare completion” to avoid useless iterations
  3. Maintain business-layer TerminalReason: AI SDK’s finishReason is too coarse
  4. Circuit breaker: stop after N consecutive failures

Zapvol current state

Pass · design finer than CCstopOnComplete’s todos-blocking is a highlight

  • Pass · stopOnComplete() (tools/stop-conditions.ts): not only checks complete tool call but also checks all todos are done — blocks completion if any todo is pending. More refined than CC
  • Pass · stepCountIs(maxSteps) hard ceiling
  • Pass · complete tool: Zapvol defines it, used in browser-subagent etc.
  • Partial · AI SDK finishReason granularity: Zapvol records “stop / length / tool-calls / error / abort” — no business-level reasons like blocking_limit / permission_denied_fatal
  • Missing · Circuit breaker missing: redesign §P2 #7 explicitly lists as gap — no fusing mechanism for repeated autocompact / precompact failures

Evolution direction

P1 · Define TerminalReason enum + derive in onFinish

export type TerminalReason =
  | "done"
  | "completed"              // complete tool called
  | "aborted"                // ctx.abortController.signal.aborted
  | "max_steps"              // stepCountIs triggered
  | "blocking_limit"         // hard ceiling exit (needs Point B's synthesis fallback)
  | "todos_incomplete"       // complete called but todos not done (stopOnComplete implements but doesn't surface)
  | "error"

// agent-round.ts onFinish — derive
const terminalReason = deriveTerminalReason(result, context)
await session.setTerminalReason(terminalReason)

Gain: UI can show different messages per reason; telemetry can aggregate by reason.

P1 · Circuit breaker for autocompact + precompact

Redesign §P2 #7 already listed. consecutiveFailures >= 3 → stop trying. Otherwise an irrecoverable session wastes many API calls (CC’s lesson: 250K wasted API calls per day, see compaction chapter).


Point G · onFinish

Best practice

  1. result.response.messages must persist: no persistence = no resume
  2. Transactional atomicity: messages and checkpoint must update in the same transaction
  3. Multi-step usage aggregation: don’t only record the last step
  4. finishReason classification: derive business-layer TerminalReason

Zapvol current state

Pass · persistence is rigorous

  • Pass · captureCompactionCheckpoint: onFinish writes compactionCheckpoint to DB (agent-round.ts:266+)
  • Pass · Round summarizer triggers: onFinish async-generates RoundSummary saved to DB (for next round’s plan-phase)
  • Pass · Multi-step usage aggregation: stepUsages accumulated (agent-round.ts:241+)
  • Partial · Whether messages + checkpoint write atomically not confirmed — from code looks like separate writes (needs audit)
  • Partial · TerminalReason derivation: records AI SDK’s raw finishReason, missing business-layer mapping (see F’s evolution)

Evolution direction

P1 · Audit messages + checkpoint transactional atomicity

Confirm whether the following two writes are in the same DB transaction:

await db.session.appendMessages(session.id, newMessages)
await db.session.updateCheckpoint(session.id, checkpoint)

If not — partial success on failure leads to inconsistent state on next resume (messages ahead of checkpoint → re-compact waste / messages behind checkpoint → index crash).

P2 · Unified Terminal reason storage

See F’s evolution direction.


Point H · Post-call (background tasks)

Best practice

  1. Memory extraction async: extract long-term memory entries without await — don’t delay the user
  2. Resume pre-compact: on session exit, background-compute resume summary — next resume is instant
  3. PostCompact hook: externally extensible point

Zapvol current state

Partial · memory extraction has; resume pre-compact missing

  • Pass · Memory extraction: memory/memory-extraction.ts complete — onFinish fire-and-forget, uses Haiku to extract memory entries, mutually exclusive with manual save_memory
  • Pass · Round summarizer: onFinish async-generates RoundSummary (compacted single round), saved to DB for next-round reuse — this is a lightweight resume pre-compact variant
  • Missing · Full resume pre-compact missing: redesign §3.3 explicitly says “we need this layer” in the CC alignment matrix. Zapvol currently has per-round summaries — missing a session-level integrated summary
  • Missing · PostCompact hook missing: no “compaction complete” extension point

Evolution direction

P1 · Add resume pre-compact

CC’s pattern: on session exit, start a background job compressing history into a global summary, next resume reads directly.

Zapvol already has round-level RoundSummary — infrastructure is there. Missing session-level integrated summary:

// agent-round.ts onFinish tail
if (terminalReason === "done" || terminalReason === "completed") {
  void generateSessionResumeSummary(session.id, context)  // don't await
}

// Next A loading
const summary = await session.loadResumeSummary()
if (summary) {
  // use summary + last N rounds raw messages
} else {
  // fallback to current logic: load all messages
}

P2 · PostCompact hook surface

For Zapvol this is future extensibility prep — if eventually building a marketplace / plugin system to let third parties run custom logic on compaction complete (e.g., alerting, workflow triggers). No current need.


Architectural decisions (not traps, but choices)

The two decisions below aren’t tactical questions inside Point A/B/…/H — they’re architectural choices. Zapvol got both right; documenting for external readers.

Decision 1: CLAUDE.md’s abstraction is “layered prompt injection points”

CC’s CLAUDE.md depends on filesystem walk. Generic agents don’t have that, but the role transfers:

Layered prompt injection points = {
  Managed:  Enterprise / policy mandatory rules (highest authority)
  Project:  Project / account-level rules (team-shared)
  Local:    User's private overrides on this project/account (not shared)
  User:     User's global preferences
  Session:  Dynamic injection this conversation
  Auto:     Agent's self-learned memory
}

Storage medium (DB / YAML / settings.json / markdown) doesn’t matter. Layered authority merge logic is the point.

Zapvol current state: has ZapvolMemory system supporting Project / User / Auto three layers (memory-service.ts). Missing Managed layer — multi-tenant enterprise scenarios need this (“for tenant X, no agent may write to production DB”). P2 evolution: add Managed scope to existing memory-service.

Decision 2: Tools should wrap in your own ServerToolConfig

AI SDK’s tool only has description + inputSchema + execute. CC’s Tool has a dozen fields. Insert a layer of your own config, compile to AI SDK tool:

type ServerToolConfig<I, O> = {
  name: string
  description: string
  inputSchema: ZodSchema<I>
  execute: (input: I, ctx: Ctx) => Promise<O>

  // Fields AI SDK doesn't know but your internals need
  compact?: (input: I, output: O, hint: CompactHint, ctx: Ctx) => Promise<CompactResult>
  requiredPermission?: PermissionDescriptor
  maxResultSizeChars?: number
  clientRenderer?: (part: ToolUIPart) => ReactNode
}

Zapvol current state: already exactly this pattern — ServerToolConfig interface complete (used by all tools under agent/tools/). Correct choice.


Complexity by scale

Zapvol already past MVP, not “starting from scratch”. Below is a guide for adding features by maturity — useful for external readers, and “which feature to add next” for Zapvol.

Already done (baseline)

Zapvol current coverage:

  • Pass Point A basics + cache control
  • Pass Point B’s stepCompactor.apply uniform call
  • Pass Point D complete (permission + maxResultSize + ServerToolConfig)
  • Pass Point E complete (precompact reference-quality)
  • Pass Point F basics + stopOnComplete todos-blocking (finer than CC)
  • Pass Point G basics + round summarizer
  • Pass Point H’s memory extraction

Stage 2 (1-2 weeks, significant improvements)

  • Point A’s system prompt cache audit (P0)
  • Point B’s step 0 / step 1+ differentiation (P0)
  • Point C’s usage component-wise recording (P1)
  • Point E’s PRECOMPACT_TRIGGER_TOKENS lowering (P1, redesign §P1 #3)

Stage 3 (after infrastructure is solid)

  • Point B’s hysteresis + blocking synthesis fallback (P1)
  • Point F’s TerminalReason enum + circuit breaker (P1, redesign §P2 #7)
  • Point H’s session-level resume pre-compact (P1)
  • Point G’s messages + checkpoint transactional atomicity audit (P1)

Don’t build early (unless real pressure)

  • Point A’s enterprise Managed scope (P2, multi-tenant only)
  • Point C’s tombstone retraction (P2, UI needs not observed)
  • Point D’s pre/post tool hook extension (P2, no blocking need)
  • Full PostCompact hook surface (P2, for external extension systems)

Zapvol evolution roadmap (consolidated)

Aggregating evolution items scattered across points, by priority:

P0 (recommend within 2 weeks)

#PointChangeExpected gain
1AappendSystemContext cache audit + move dynamic content to messagesPreserve prompt cache hit rate
2BAdd “step 0 vs step 1+” differentiation to stepCompactorSave one autocompact LLM call on later steps

P1 (infrastructure, 3-4 weeks)

#PointChangeRelated redesign item
3CUsage recorded by component (input/output/cacheRead/cacheCreate)
4ELower PRECOMPACT_TRIGGER_TOKENS 2500 → 500-1000§P1 #3
5FDefine TerminalReason enum + derive in onFinish
6FCircuit breaker for autocompact / precompact§P2 #7
7GAudit messages + checkpoint transactional atomicity
8HSession-level resume pre-compact§3.3 (CC has, we don’t)
9BHysteresis (over budget ≤5% doesn’t trigger)§P2 #6
10BBlocking synthesis fallback + 413 reactive handler§P1 #4

P2 (opportunistic, wait for pressure)

#PointChangeTrigger condition
11AManaged scope in memory-serviceMulti-tenant enterprise customers
12CTombstone / streaming retractionUI feedback on streaming fallback residue
13DTool wrapper pre/post/error hooksNeed project-level custom interception
14HPostCompact hook surfaceExternal extension / plugin system

10-item self-audit checklist

Scan your agent codebase (Zapvol or external):

#CheckZapvol state
1system byte-stable? No Date.now() / user context concatenated?Partial Needs audit
2messages have explicit providerOptions.cacheControl breakpoints?Pass
3prepareStep differentiates first step vs later?Missing Needs change
4Every tool execute has permission check?Pass
5Tool description tutorial-style?Pass
6Tool execute passes abortSignal to every internal long-running op?Pass
7Tool output has size cap + truncation + offload?Pass
8stopWhen combines multiple conditions (stepCountIs + hasToolCall('complete') + custom)?Pass
9onFinish persists response.messages?Pass
10onFinish has async background tasks (no await) for memory extraction / resume pre-compute?Partial Half (memory yes, resume no)

Zapvol score: 7 Pass / 2 Partial / 1 Missinginfrastructure solid, key improvement points concentrated in A / B / H.


Next: connect the dots with state flow

This chapter cuts by hook — each hook discussed independently (“do this here”). But in real code, the 8 hooks share state spanning steps, rounds, sessions: E writes compactCache → next B reads; onStepFinish writes lastStepTotalTokens → B reads; G persists messages → next A reads.

The next chapter, Lifecycle State Flow, cuts by state flow — tracing how each state evolves across the 8 points in one conversation. Reading both connects the dots into a coherent pipeline.


Further reading

  • Design Lessons — this chapter’s abstract counterpart (principles layer)
  • Lifecycle State Flow — complementary state-flow view
  • Agent Execution Loop — CC’s own query() implementation
  • Compaction — full 5-tier compaction walkthrough
  • Zapvol reference: packages/backend/src/agent/agent-round.ts, packages/backend/src/agent/compaction/, packages/backend/src/agent/memory/memory-extraction.ts
  • Zapvol evolution draft: .claude/design/compaction-redesign.md
Was this page helpful?