Related ToolsClaudeClaude Code

Claude's Prompt Cache TTL Silently Dropped from 1 Hour to 5 Minutes

Claude by Anthropic
Image: Anthropic

A cache regression in Claude's API has been quietly costing developers money. The prompt caching TTL (time-to-live - how long Claude stores a cached version of your system prompt or instructions before requiring you to re-send and re-pay for them) appears to have silently dropped from 1 hour to just 5 minutes, with no changelog entry or announcement from Anthropic.

For context: prompt caching lets developers store frequently-used text - like a long system prompt or a reference document - so Claude doesn't charge full token rates on every API call. At Anthropic's standard pricing, cache writes cost 25% more than normal input tokens, but cache reads cost 90% less. The math only works if the cache stays warm long enough. A 5-minute TTL means the cache expires after one short distraction, turning what should be a cost-saving feature into a near-constant re-write expense.

Developers started noticing their caching wasn't working as expected and traced the problem back to the TTL change. The original documented TTL was 1 hour - a reasonable window for most API workflows. Five minutes is barely functional for anything but the fastest fully-automated pipelines.

As of publication, Anthropic has not officially confirmed or explained the change. It's unclear whether this is a bug, a silent architecture decision, or a server-side incident still under investigation.

If you're building applications with Claude and relying on prompt caching to keep costs manageable, your bills may have quietly increased. Check your cache hit rates in the API response metadata - a significant drop points directly to this regression. Developers using Claude Code in long sessions that depend on cached context will feel this too.