Controls and ROI

Four-layer cost control mechanism (tier caps, runtime budgets, user visibility, global kill switch) with monitoring criteria; methodology for converting human time into a cost-benefit baseline.

Enterprise procurement focuses on two coupled questions: “is expenditure controllable” determines whether to purchase; “is the investment justified” determines whether to renew. The former maps to the control surface, the latter to the value surface. This section addresses each in turn.

Part 1: Cost control mechanisms

Cost overruns are typically prevented through layered control mechanisms — the four-layer arrangement below, ordered from coarsest to finest.

Layer 1: Tier hard budget caps

Each user is bound to a tier, which determines three configurations:

ConfigurationLiteProUltra
Default modelHaiku 4.5Sonnet 4.6Opus 4.7
Per-task token cap200K800K2M
Per-day token cap2M20M80M
Available toolscore tools+ BUA + MCP+ advanced subagents

(Authoritative values reside in apps/server/scripts/db-seed.ts; admins adjust under Admin → Tiers. The table presents reference defaults, not contractual guarantees.)

Rationale for capping by token rather than dollar: dollar caps drift with model pricing — capacity tightens silently when models become more expensive and loosens when they become cheaper. Token quantities are stable units at the product-semantic layer, requiring no quota recalculation across model upgrades or downgrades.

Layer 2: Runtime streaming budget

During task execution, every step pre-checks accumulated tokens against the tier cap. Two responses are configured for cap breach:

  • Soft termination: the current step completes, partial progress returns to the user, status marked terminated_budget — standard path
  • Hard termination: a detected runaway loop (same tool, same arguments, N consecutive invocations) is stopped immediately — exception path

Implementation resides in the task-orchestrator; see the architecture section’s task-orchestration.

Layer 3: User-layer usage visibility

The user may view in the UI at any time:

  • Tokens consumed and remaining for the current task
  • Estimated dollar amount at the current model’s unit price
  • Daily and monthly cumulative totals

Usage visibility is more effective than backend throttling — observable cost drives users to reduce ineffective operations of their own accord. This layer is not a hard budget constraint but a behavioral feedback mechanism, primarily reducing unnecessary retries and exploratory invocations.

Layer 4: Global kill switch

Administrators may terminate all running tasks and all BUA sessions through a single action (see the BUA section for details). This mechanism is an exception backstop, not a routine tool — under steady-state operation, tier and task budgets are sufficient; the kill switch exists to guarantee termination capability under unexpected conditions.

Monitoring metrics

The agent_token_usage Grafana panel (operations/dashboards) aggregates token consumption by tier, user, and task type. Three metrics warrant long-term tracking:

  • Daily input token curve — establish a baseline aligned with the product’s active periods; monitor deltas rather than absolute values. Spikes during non-active periods typically indicate runaway batch tasks
  • Cache hit rate — sustained values below 50% suggest the prompt may have been modified into dynamic content; dynamic prompts result in permanent cache miss, with that portion of the bill incurring full input-tier pricing
  • Output / input ratio — a sudden increase indicates the model has shifted toward long-form reasoning rather than tool invocation, typically caused by design issues in recently added tool descriptions

Part 2: ROI estimation

Regarding the term “ROI”: this section uses “ROI” in its broad business-context meaning — the cost-benefit estimation of “whether the investment is justified and how soon it pays back” — encompassing per-task savings, monthly savings, payback period, and leverage ratio. The strict financial definition ROI = (gain − cost) / cost × 100% represents only one percentage indicator; the estimation scope of this section exceeds that formula. “ROI” is retained as the section heading because it is the most common search term used by non-engineering stakeholders.

Baseline: human-versus-agent comparison

The most common cost-benefit model uses a human-labor baseline. The following table can be transferred directly to a spreadsheet:

FieldDefinitionExample (inbox triage task)
Task frequencyExecutions per user per month20
Human task durationTime required for human execution15 minutes
Agent cost per taskSee cost-model$0.19
Human hourly cost (loaded)Salary plus benefits$40 / hour
Human cost per taskHourly rate × duration$40 × 0.25 = $10
Saving per taskHuman cost − agent cost$9.81
Saving per user per monthFrequency × per-task saving$196

All fields are explicit in the table; procurement teams may substitute their own data to recompute payback period (subscription ÷ per-user monthly saving). The advantage of this approach is that conclusions are independently verifiable rather than dependent on vendor narrative.

Incorporating failure cost

Agents fail with non-zero probability. A complete cost-benefit model must incorporate failure cost:

  • Failure rate: a reasonable assumption range is 5-15% (by task type; high-risk tasks with HITL typically below 2%)
  • Failure cost = agent tokens already consumed (sunk cost) + human redo time
  • Corrected per-task saving = per-task saving × (1 − failure rate) − failure cost × failure rate

Substituting a 5% failure rate and per-incident failure cost = $0.19 (sunk) + $10 (redo) = $10.19:

Corrected = $9.81 × 0.95 − $10.19 × 0.05
          = $9.32 − $0.51
          ≈ $8.81 per task

The leverage ratio (saving ÷ cost) decreases from ~52× to ~46× — a reduction of approximately 10%, still in a substantially favorable range. Estimates that incorporate failure-rate correction carry higher credibility than idealized estimates — without explicit correction, procurement teams apply their own discount, which empirically exceeds the actual failure rate.

Task types unsuitable for agent execution

A complete cost-benefit report must explicitly enumerate task types that are not recommended for agent delegation, to prevent stakeholders from treating the agent as a universal solution:

  • Single-step, non-repetitive tasks — title modification, status lookup, and similar. Agent startup cost (cache write, tool descriptions, first-step inference) consumes approximately 10K+ tokens, exceeding the saving
  • Irreversible external actions — client email dispatch, payment execution, public content publication. Such decisions should not be made autonomously by the agent; HITL workflow is required at minimum
  • Sensitive data not permitted to pass through models — medical diagnostic data, unmasked banking information, and similar. Such workflows should use traditional channels or specialized tools, not the agent

Omitting this section leads to post-deployment misuse cases whose root cause is task-type mismatch rather than agent capability deficiency.

Section summary

Returning to the three governing principles from overview:

  1. Input is inexpensive, cached input is cheaper, output is expensive, history is the most expensive
  2. Tokens not consumed are worth more than tokens saved
  3. Failure cost ≥ success cost

Combined with the cost-benefit baseline table and the unsuitable-task list in this section, these constitute the complete material set required by non-engineering teams for procurement decisions and internal reporting.

Was this page helpful?