Tools Notable

Claude Code Cache Misses Cost 12.5x More Than Hits - Here's the Math

May 24, 2026 2 min read

Image: Anthropic

12.5 times. That's the price gap between a prompt cache hit and a cache miss in Claudee Code](/tools/claude-code/) - and most users have no idea it's silently inflating their token bill every session.

The math comes straight from Anthropic's prompt caching documentation. Cache write tokens cost 1.25x the base input price. Cache read tokens cost 0.1x. Divide one by the other: 1.25 / 0.1 = 12.5x. When Claude reads your system prompt, codebase context, or long conversation history from cache, you pay a tenth of normal rates. When the cache expires and Claude has to reprocess that same context fresh, you pay 1.25x instead - and the difference compounds fast on heavy sessions.

Prompt caching works by storing a snapshot of your context (your instructions, files, conversation history) so Claude doesn't have to re-read it from scratch on every message. Think of it like a warm browser cache: the first load is expensive, repeat visits are cheap. The catch is the TTL - time to live. Claude Code's cache expires after 5 minutes of inactivity.

What Quietly Kills Your Cache Mid-Session

Five behaviors are responsible for most unexpected cache misses:

Pausing too long between messages. Any gap over 5 minutes lets the cache expire. The next message has to rebuild it from scratch at full write price.
Editing your system prompt. Changing even one character in your instructions invalidates the entire cache downstream of that change.
Adding or removing files from context mid-session. Attaching a new file or detaching one shifts the context structure enough to trigger a full rewrite.
Running /clear or starting a new conversation thread. Obvious in retrospect, but easy to do reflexively when a session feels stuck.
Switching between Claude Code instances pointing at the same project. Each instance has its own cache state, so parallel sessions don't share cache reads.

The Cost Difference in Practice

On a typical Claude Code session with a large codebase loaded - say, 50,000 tokens of context - the difference is roughly this: hitting cache on 20 back-and-forth exchanges costs about 100,000 cache read tokens (0.1x rate). Missing cache and triggering rewrites on those same exchanges costs 1,250,000 write tokens (1.25x rate). The dollar amounts depend on which Claude model you're using, but the ratio stays fixed at 12.5x regardless.

The practical fix is mostly about pacing and awareness. Keep your sessions continuous rather than stopping and resuming. Finalize your system prompt before starting a long work session rather than tweaking it mid-way. If you're stepping away for more than a few minutes, know the cache will be cold when you return and plan accordingly.

None of this is hidden - Anthropic's documentation is transparent about the pricing. But most Claude Code users are focused on the work, not on monitoring their cache state. The 5-minute TTL is short enough that a normal distraction - checking Slack, grabbing coffee - is all it takes to reset to full price.

What Quietly Kills Your Cache Mid-Session

The Cost Difference in Practice

Related Tools

More from today

ChatGPT Holds Character Consistency Across a Styled Image Series

Mythos Prepares Launch with Native Claude Code and Security Workflow Support

Vision Models vs. OCR on 30 Dense PDFs: What a 171-Question Benchmark Reveals

Cookie Preferences