Related ToolsClaude CodeClaude

Claude Code v2.1.100 Silently Adds 20K Hidden Tokens Per Request

Claude by Anthropic
Image: Anthropic

Your Claude Code usage limits are disappearing faster than they should - and it's not your code getting longer.

Heavy users of Claude Code Max (the 5x plan) have noticed that since version 2.1.100, daily limits hit the wall hours earlier than expected. The culprit appears to be a silent server-side change: every request now carries roughly 20,000 extra invisible tokens that Anthropic hasn't disclosed and users can't see or control.

For context: a token is roughly 4 characters of text, so 20,000 tokens is about 15,000 words - a short novel's worth of data attached to every single request, silently. If you're running multiple parallel Claude Code sessions (common for developers orchestrating complex workflows), that overhead compounds fast.

What the Proxy Evidence Shows

The discovery came from users running Claude Code traffic through local proxies - software that intercepts and logs network requests, letting you see exactly what's being sent to Anthropic's servers. When comparing request payloads between v2.1.98 and v2.1.100+, the difference is visible in the raw data: newer versions send substantially more per call.

Anthropic hasn't acknowledged this change publicly. There's no changelog entry, no announcement. The tokens appear server-side, meaning they're added after your request leaves your machine - you can't strip them out or configure them away.

The practical hit is significant. Users running 3-5 parallel sessions through a full workday are now hitting limits by midday. A 40% faster burn rate on a $100/month plan is effectively a $40/month price increase with no notice.

What You Can Do Right Now

The only confirmed fix is rolling back to v2.1.98. Avoid auto-updates if your package manager pulls the latest version automatically.

If you can't downgrade, some users report partial relief by reducing parallel sessions and keeping context windows shorter. A context window is the amount of conversation history Claude can reference at once - keeping it tight means less total data flowing per request.

There's a secondary concern beyond just limit burn rate: inflated context can degrade output quality. When Claude's working memory fills up with system-level overhead you didn't put there, it has less room for your actual code and instructions. Whether this explains any recent quality regressions is harder to verify without Anthropic's disclosure.

The change appears intentional - possibly a new system prompt, extended tool definitions, or context-injection features added to Claude Code's architecture. Whatever it is, 20,000 tokens per request is not a rounding error. Heavy Claude Code users should verify which version they're running and track their limit consumption closely until Anthropic addresses this publicly.