Related ToolsClaude CodeClaudeClaude For DesktopCursorCody

Inside Claude Code's Context Compression: What Survives and What Gets Lost

Claude by Anthropic
Image: Anthropic

22,000 tokens in, 1,400 tokens out. That's the compression ratio a reverse-engineering effort found inside Claude Code's auto-compaction system, and it explains a lot about why the AI coding assistant sometimes "forgets" things mid-session.

By extracting embedded JavaScript from Claude Code's compiled binary, a researcher uncovered the exact prompts, thresholds, and logic that control how conversations get compressed when you hit the context limit. The findings are practical knowledge for anyone who spends hours in Claude Code sessions.

Compression Fires at 83% Capacity

Claude Code's context window holds roughly 200,000 tokens (think of a token as about three-quarters of a word). The auto-compaction trigger fires at about 83% capacity, or around 167,000 tokens. The system reserves 20,000 tokens for model output and keeps a 13,000-token buffer, which means you get roughly 33,000 tokens of breathing room before compression kicks in.

You can adjust this threshold with the CLAUDE_AUTOCOMPACT_PCT_OVERRIDE environment variable (set it from 0-100), or disable it entirely with DISABLE_AUTO_COMPACT=true. Disabling it means you'll hit the hard context limit and lose everything instead of getting a compressed summary, so that's mostly useful for short sessions.

The 9-Section Summary

When compression triggers, Claude Code makes a separate API call that squeezes the entire conversation into a structured summary with nine sections:

What happened (sections 1-6):

  1. Your stated intent
  2. Technologies involved
  3. Files touched, with specific line numbers
  4. Errors encountered
  5. Solutions attempted
  6. All your verbatim messages (every direct instruction you gave)

What's next (sections 7-9): 7. Pending work inferred from unresolved issues 8. Current project state 9. The immediate next action

The compression call itself runs with reduced capabilities: no extended thinking, no tool access, no image processing, and output capped at 20,000 tokens instead of the usual 64,000. After compaction, the system re-attaches up to five recently accessed files (capped at 5,000 tokens each, 50,000 total), plus background task statuses and any active plan mode state.

What Survives, What Gets Fuzzy, What Disappears

This is the part that matters for daily use.

Survives well: File paths with line numbers, function signatures, error messages, explicit user preferences, and architectural decisions. The system prompt specifically instructs the compressor to "include specific code snippets, file paths with line numbers, exact function signatures." Concrete details get priority.

Gets fuzzy: Reasoning chains, rejected alternatives, and the why behind decisions. If Claude Code spent ten messages debugging an approach before settling on a solution, the compressed version keeps the solution but loses the debugging logic. Multi-step reasoning context degrades significantly.

Disappears entirely: Casual asides, tangential discussions, speculative "what if" explorations, and file contents that haven't been accessed recently. If you explored an idea and then moved on, that exploration is gone after compaction.

Working With the Compression, Not Against It

A few practical takeaways from these findings:

  • State your goal explicitly and early. Section 1 captures your primary intent, so a clear "I want to refactor the auth module to use JWT" survives better than gradually revealing your plan across multiple messages.
  • Say preferences out loud. "Always use TypeScript" persists through compression. Showing a preference through examples without stating it directly probably won't.
  • Use /compact focus on [topic] to steer what the summary emphasizes when you manually trigger compression. This lets you protect context about a specific area you're still working on.
  • Don't rely on compression to preserve your reasoning. If you worked through a complex debugging session and reached an important conclusion, restate the conclusion explicitly. The journey won't survive, but a clear summary of the destination will.

One more thing worth knowing: Claude Code's system prompt alone costs about 16,000 tokens and gets sent with every single API call. That's roughly 8% of your context window consumed before you type a single character. Long sessions are working with less space than you might assume.