Related ToolsClaudeClaude Code

Claude Is Billing Some Users Up to 5x More Tokens Than They Generate

Claude by Anthropic
Image: Anthropic

127,000 tokens billed for roughly 25,000 tokens of unique content. That's what a user found after asking a Claude Opus 4.7 agent to investigate its own consumption across an 8-turn session - a roughly 5-to-1 gap between what was produced and what was charged.

The investigation surfaced GitHub issues tracking this problem back to mid-December 2025. The Claude Code binary has at least two reverse-engineered bugs contributing to the discrepancy. Notably, the model identified the billing gap itself while reading through its own session logs.

How the Overcounting Happens

Token billing converts text into numerical units before the model processes it - roughly 750 words equals 1,000 tokens. Simple in theory, but it compounds fast in multi-step agentic sessions.

When Claude Code runs an agentic session - where the model autonomously executes multiple steps in sequence - every new request includes the full conversation history, system instructions, and all tool outputs from previous steps. The model re-reads everything from scratch on each turn because that's how it maintains memory across steps. You're billed for the entire package each time, not just the new content.

In a short chat, this is manageable. In a long agentic run, the accumulated context can reach tens of thousands of tokens before the meaningful work starts. The bugs identified appear to inflate this further: a significant portion of those 127K billed tokens were content sent redundantly rather than once.

What This Costs Max Plan Subscribers

Claude's Max plan charges a flat monthly fee with usage limits, not direct pay-per-token billing. If tokens are being overcounted at 5-to-1, subscribers hit their usage ceiling roughly five times faster than the plan's stated capacity implies.

Users who've reported faster-than-expected plan exhaustion have generally been told their usage is within normal patterns. These findings suggest otherwise - with documented GitHub issues running five months without a public Anthropic response.

The practical workaround is to keep agentic sessions short. The longer a session runs, the more accumulated context inflates each new request. Starting fresh sessions rather than extending long ones keeps billing overhead manageable. For heavy Claude Code work, breaking jobs into focused shorter sessions beats running one continuous multi-hour session until this gets resolved.

Token billing is abstract enough that most users can't easily audit it themselves. It took an automated agent systematically reviewing session logs to surface issues that have apparently sat in GitHub since December 2025. Anthropic hasn't publicly acknowledged these findings.