Controls and ROI
Four-layer cost control mechanism (tier caps, runtime budgets, user visibility, global kill switch) with monitoring criteria; methodology for converting human time into a cost-benefit baseline.
Enterprise procurement focuses on two coupled questions: “is expenditure controllable” determines whether to purchase; “is the investment justified” determines whether to renew. The former maps to the control surface, the latter to the value surface. This section addresses each in turn.
Part 1: Cost control mechanisms
Cost overruns are typically prevented through layered control mechanisms — the four-layer arrangement below, ordered from coarsest to finest.
Layer 1: Tier hard budget caps
Each user is bound to a tier, which determines three configurations:
| Configuration | Lite | Pro | Ultra |
|---|---|---|---|
| Default model | Haiku 4.5 | Sonnet 4.6 | Opus 4.7 |
| Per-task token cap | 200K | 800K | 2M |
| Per-day token cap | 2M | 20M | 80M |
| Available tools | core tools | + BUA + MCP | + advanced subagents |
(Authoritative values reside in apps/server/scripts/db-seed.ts; admins adjust under Admin → Tiers. The table presents reference defaults, not contractual guarantees.)
Rationale for capping by token rather than dollar: dollar caps drift with model pricing — capacity tightens silently when models become more expensive and loosens when they become cheaper. Token quantities are stable units at the product-semantic layer, requiring no quota recalculation across model upgrades or downgrades.
Layer 2: Runtime streaming budget
During task execution, every step pre-checks accumulated tokens against the tier cap. Two responses are configured for cap breach:
- Soft termination: the current step completes, partial progress returns to the user, status marked
terminated_budget— standard path - Hard termination: a detected runaway loop (same tool, same arguments, N consecutive invocations) is stopped immediately — exception path
Implementation resides in the task-orchestrator; see the architecture section’s task-orchestration.
Layer 3: User-layer usage visibility
The user may view in the UI at any time:
- Tokens consumed and remaining for the current task
- Estimated dollar amount at the current model’s unit price
- Daily and monthly cumulative totals
Usage visibility is more effective than backend throttling — observable cost drives users to reduce ineffective operations of their own accord. This layer is not a hard budget constraint but a behavioral feedback mechanism, primarily reducing unnecessary retries and exploratory invocations.
Layer 4: Global kill switch
Administrators may terminate all running tasks and all BUA sessions through a single action (see the BUA section for details). This mechanism is an exception backstop, not a routine tool — under steady-state operation, tier and task budgets are sufficient; the kill switch exists to guarantee termination capability under unexpected conditions.
Monitoring metrics
The agent_token_usage Grafana panel (operations/dashboards) aggregates token consumption by tier, user, and task type. Three metrics warrant long-term tracking:
- Daily input token curve — establish a baseline aligned with the product’s active periods; monitor deltas rather than absolute values. Spikes during non-active periods typically indicate runaway batch tasks
- Cache hit rate — sustained values below 50% suggest the prompt may have been modified into dynamic content; dynamic prompts result in permanent cache miss, with that portion of the bill incurring full input-tier pricing
- Output / input ratio — a sudden increase indicates the model has shifted toward long-form reasoning rather than tool invocation, typically caused by design issues in recently added tool descriptions
Part 2: ROI estimation
Regarding the term “ROI”: this section uses “ROI” in its broad business-context meaning — the cost-benefit estimation of “whether the investment is justified and how soon it pays back” — encompassing per-task savings, monthly savings, payback period, and leverage ratio. The strict financial definition
ROI = (gain − cost) / cost × 100%represents only one percentage indicator; the estimation scope of this section exceeds that formula. “ROI” is retained as the section heading because it is the most common search term used by non-engineering stakeholders.
Baseline: human-versus-agent comparison
The most common cost-benefit model uses a human-labor baseline. The following table can be transferred directly to a spreadsheet:
| Field | Definition | Example (inbox triage task) |
|---|---|---|
| Task frequency | Executions per user per month | 20 |
| Human task duration | Time required for human execution | 15 minutes |
| Agent cost per task | See cost-model | $0.19 |
| Human hourly cost (loaded) | Salary plus benefits | $40 / hour |
| Human cost per task | Hourly rate × duration | $40 × 0.25 = $10 |
| Saving per task | Human cost − agent cost | $9.81 |
| Saving per user per month | Frequency × per-task saving | $196 |
All fields are explicit in the table; procurement teams may substitute their own data to recompute payback period (subscription ÷ per-user monthly saving). The advantage of this approach is that conclusions are independently verifiable rather than dependent on vendor narrative.
Incorporating failure cost
Agents fail with non-zero probability. A complete cost-benefit model must incorporate failure cost:
- Failure rate: a reasonable assumption range is 5-15% (by task type; high-risk tasks with HITL typically below 2%)
- Failure cost = agent tokens already consumed (sunk cost) + human redo time
- Corrected per-task saving = per-task saving × (1 − failure rate) − failure cost × failure rate
Substituting a 5% failure rate and per-incident failure cost = $0.19 (sunk) + $10 (redo) = $10.19:
Corrected = $9.81 × 0.95 − $10.19 × 0.05
= $9.32 − $0.51
≈ $8.81 per task
The leverage ratio (saving ÷ cost) decreases from ~52× to ~46× — a reduction of approximately 10%, still in a substantially favorable range. Estimates that incorporate failure-rate correction carry higher credibility than idealized estimates — without explicit correction, procurement teams apply their own discount, which empirically exceeds the actual failure rate.
Task types unsuitable for agent execution
A complete cost-benefit report must explicitly enumerate task types that are not recommended for agent delegation, to prevent stakeholders from treating the agent as a universal solution:
- Single-step, non-repetitive tasks — title modification, status lookup, and similar. Agent startup cost (cache write, tool descriptions, first-step inference) consumes approximately 10K+ tokens, exceeding the saving
- Irreversible external actions — client email dispatch, payment execution, public content publication. Such decisions should not be made autonomously by the agent; HITL workflow is required at minimum
- Sensitive data not permitted to pass through models — medical diagnostic data, unmasked banking information, and similar. Such workflows should use traditional channels or specialized tools, not the agent
Omitting this section leads to post-deployment misuse cases whose root cause is task-type mismatch rather than agent capability deficiency.
Section summary
Returning to the three governing principles from overview:
- Input is inexpensive, cached input is cheaper, output is expensive, history is the most expensive
- Tokens not consumed are worth more than tokens saved
- Failure cost ≥ success cost
Combined with the cost-benefit baseline table and the unsuitable-task list in this section, these constitute the complete material set required by non-engineering teams for procurement decisions and internal reporting.