Controls and ROI

Enterprise procurement focuses on two coupled questions: “is expenditure controllable” determines whether to purchase; “is the investment justified” determines whether to renew. The former maps to the control surface, the latter to the value surface. This section addresses each in turn.

Part 1: Cost control mechanisms

Cost overruns are typically prevented through layered control mechanisms — the four-layer arrangement below, ordered from coarsest to finest.

Layer 1: Tier hard budget caps

Each user is bound to a tier, which determines three configurations:

Configuration	Lite	Pro	Ultra
Default model	Haiku 4.5	Sonnet 4.6	Opus 4.7
Per-task token cap	200K	800K	2M
Per-day token cap	2M	20M	80M
Available tools	core tools	+ BUA + MCP	+ advanced subagents

(Authoritative values reside in apps/server/scripts/db-seed.ts; admins adjust under Admin → Tiers. The table presents reference defaults, not contractual guarantees.)

Rationale for capping by token rather than dollar: dollar caps drift with model pricing — capacity tightens silently when models become more expensive and loosens when they become cheaper. Token quantities are stable units at the product-semantic layer, requiring no quota recalculation across model upgrades or downgrades.

Layer 2: Runtime streaming budget

During task execution, every step pre-checks accumulated tokens against the tier cap. Two responses are configured for cap breach:

Soft termination: the current step completes, partial progress returns to the user, status marked terminated_budget — standard path
Hard termination: a detected runaway loop (same tool, same arguments, N consecutive invocations) is stopped immediately — exception path

Implementation resides in the task-orchestrator; see the architecture section’s task-orchestration.

Layer 3: User-layer usage visibility

The user may view in the UI at any time:

Tokens consumed and remaining for the current task
Estimated dollar amount at the current model’s unit price
Daily and monthly cumulative totals

Usage visibility is more effective than backend throttling — observable cost drives users to reduce ineffective operations of their own accord. This layer is not a hard budget constraint but a behavioral feedback mechanism, primarily reducing unnecessary retries and exploratory invocations.

Layer 4: Global kill switch

Administrators may terminate all running tasks and all BUA sessions through a single action (see the BUA section for details). This mechanism is an exception backstop, not a routine tool — under steady-state operation, tier and task budgets are sufficient; the kill switch exists to guarantee termination capability under unexpected conditions.

Monitoring metrics

The agent_token_usage Grafana panel (operations/dashboards) aggregates token consumption by tier, user, and task type. Three metrics warrant long-term tracking:

Daily input token curve — establish a baseline aligned with the product’s active periods; monitor deltas rather than absolute values. Spikes during non-active periods typically indicate runaway batch tasks
Cache hit rate — sustained values below 50% suggest the prompt may have been modified into dynamic content; dynamic prompts result in permanent cache miss, with that portion of the bill incurring full input-tier pricing
Output / input ratio — a sudden increase indicates the model has shifted toward long-form reasoning rather than tool invocation, typically caused by design issues in recently added tool descriptions

Part 2: ROI estimation

Regarding the term “ROI”: this section uses “ROI” in its broad business-context meaning — the cost-benefit estimation of “whether the investment is justified and how soon it pays back” — encompassing per-task savings, monthly savings, payback period, and leverage ratio. The strict financial definition ROI = (gain − cost) / cost × 100% represents only one percentage indicator; the estimation scope of this section exceeds that formula. “ROI” is retained as the section heading because it is the most common search term used by non-engineering stakeholders.

Baseline: human-versus-agent comparison

The most common cost-benefit model uses a human-labor baseline. The following table can be transferred directly to a spreadsheet:

Field	Definition	Example (inbox triage task)
Task frequency	Executions per user per month	20
Human task duration	Time required for human execution	15 minutes
Agent cost per task	See cost-model	$0.19
Human hourly cost (loaded)	Salary plus benefits	$40 / hour
Human cost per task	Hourly rate × duration	$40 × 0.25 = $10
Saving per task	Human cost − agent cost	$9.81
Saving per user per month	Frequency × per-task saving	$196

All fields are explicit in the table; procurement teams may substitute their own data to recompute payback period (subscription ÷ per-user monthly saving). The advantage of this approach is that conclusions are independently verifiable rather than dependent on vendor narrative.

Incorporating failure cost

Agents fail with non-zero probability. A complete cost-benefit model must incorporate failure cost:

Failure rate: a reasonable assumption range is 5-15% (by task type; high-risk tasks with HITL typically below 2%)
Failure cost = agent tokens already consumed (sunk cost) + human redo time
Corrected per-task saving = per-task saving × (1 − failure rate) − failure cost × failure rate

Substituting a 5% failure rate and per-incident failure cost = $0.19 (sunk) + $10 (redo) = $10.19:

Corrected = $9.81 × 0.95 − $10.19 × 0.05
          = $9.32 − $0.51
          ≈ $8.81 per task

The leverage ratio (saving ÷ cost) decreases from ~52× to ~46× — a reduction of approximately 10%, still in a substantially favorable range. Estimates that incorporate failure-rate correction carry higher credibility than idealized estimates — without explicit correction, procurement teams apply their own discount, which empirically exceeds the actual failure rate.

Task types unsuitable for agent execution

A complete cost-benefit report must explicitly enumerate task types that are not recommended for agent delegation, to prevent stakeholders from treating the agent as a universal solution:

Single-step, non-repetitive tasks — title modification, status lookup, and similar. Agent startup cost (cache write, tool descriptions, first-step inference) consumes approximately 10K+ tokens, exceeding the saving
Irreversible external actions — client email dispatch, payment execution, public content publication. Such decisions should not be made autonomously by the agent; HITL workflow is required at minimum
Sensitive data not permitted to pass through models — medical diagnostic data, unmasked banking information, and similar. Such workflows should use traditional channels or specialized tools, not the agent

Omitting this section leads to post-deployment misuse cases whose root cause is task-type mismatch rather than agent capability deficiency.

Section summary

Returning to the three governing principles from overview:

Input is inexpensive, cached input is cheaper, output is expensive, history is the most expensive
Tokens not consumed are worth more than tokens saved
Failure cost ≥ success cost

Combined with the cost-benefit baseline table and the unsuitable-task list in this section, these constitute the complete material set required by non-engineering teams for procurement decisions and internal reporting.