Research

Your AI Agent Bill Isn't About Tokens Anymore

March 23, 2026 3 min read

How much did that AI agent just cost you? If you checked your token count, you probably have no idea.

That's the argument laid out in a recent analysis from Revenue Model, and it lands on a problem anyone running AI agents has likely felt in their invoice: the bill keeps climbing even when your token usage looks modest.

Tokens Measure Text, Not Work

The disconnect is simple. Tokens count how much text moves in and out of a model. That worked fine when AI meant "send a prompt, get a response." But agents don't work that way. A single request to an AI agent can trigger a planning step, a retrieval step, tool calls, validation, retries, and sometimes parallel sub-agents tackling different parts of the task. Each of those steps burns compute. Most of them barely register in your token count.

Think of it like measuring a road trip by counting how many times you turned the steering wheel. Technically related to driving, but it tells you almost nothing about fuel consumption.

Where the Hidden Costs Pile Up

The article identifies several places where compute grows without a proportional jump in tokens:

Context reprocessing: Agents feed the same context window back through the model multiple times across an execution chain. You pay for that compute each time, but the tokens look identical.
Planning and validation loops: Before an agent acts, it reasons about what to do. After it acts, it checks whether the result was correct. Both steps require model inference (running the input through the AI to produce output), and both cost compute.
Parallel execution: When an agent splits a task into sub-tasks running simultaneously, compute multiplies while the final token output might be a single summary paragraph.
Retries: When a step fails and the agent tries again, you're paying double (or triple) compute for what looks like a single interaction.

This maps to real-world reports of organizations discovering massive, unexpected AI bills despite what looked like reasonable token volumes.

What This Means for Your Budget

Most AI pricing today is still token-based. OpenAI, Anthropic, and Google all charge per input and output token. That pricing model made sense for chat-style interactions. For agents that run multi-step workflows, it creates a measurement gap between what you can track and what you actually spend.

The practical takeaway: if you're deploying AI agents for business workflows, monitoring token counts alone will mislead you. You need to track execution depth (how many steps your agent takes per request), the number of model calls per task, and whether your agents are reprocessing the same context repeatedly.

Some providers are starting to acknowledge this. Anthropic's Claude Code, for example, surfaces cost-per-session rather than raw token counts. But the industry hasn't settled on a standard way to measure and bill for agentic compute.

For now, treat your token dashboard the way you'd treat an odometer on a car with a fuel leak. It's telling you something, but not the thing that matters most.

Tokens Measure Text, Not Work

Where the Hidden Costs Pile Up

What This Means for Your Budget

Related Tools

More from today

Anthropic's Physicist Used Claude to Write a Real Research Paper in Two Weeks

Study of 134,000 Legal AI Queries Shows Lawyers Still Outperform

The 12 Writing Tics That Instantly Mark Your Text as AI-Generated

Cookie Preferences