Open Source

Tamp Proxy Compresses LLM Context by 50%, No Code Changes Required

March 25, 2026 2 min read

Coding agents burn through tokens fast. Every file read, directory listing, and error log gets sent to the LLM as context, and most of that text is whitespace, line numbers, and formatting that the model doesn't need. Tamp is a new open-source proxy that strips it out before it hits the API.

The tool, created by developer Stas Kulesh and released under the MIT license, runs as a local HTTP proxy at localhost:7778. Point your coding agent at the proxy instead of the API endpoint directly, and Tamp intercepts every request, compresses the tool output, then forwards it. No changes to your agent's code or configuration beyond the endpoint URL.

It works with Claude Code, Aider, Cursor, Cline, Windsurf, and several others, supporting Anthropic, OpenAI, and Google Gemini API formats. The proxy auto-detects which format to use from the request path.

Five Compression Stages

Tamp runs a pipeline of five lossless compression stages, all enabled by default:

Minify strips JSON whitespace (about 22% reduction)
TOON uses columnar encoding for repetitive arrays like file listings (49% on directory output)
Strip-lines removes line-number prefixes from tool output
Whitespace normalizes blank lines and trailing spaces
LLMLingua applies neural compression via a sidecar model (40% on source code)

Across 120 API calls in the project's benchmark, the overall reduction was 52.6% fewer input tokens, with per-scenario compression ranging from 21.8% to 81.4%. Quality verification showed identical responses across all test scenarios.

The practical savings depend on your usage. At Anthropic's $3 per million input tokens, the project estimates about $48 per developer per month. For Claude Max subscribers on fixed-price plans, the value proposition is different: compressing tokens before they count against your quota effectively doubles your context budget.

Installation is a single command: npx @sliday/tamp. The project is early (11 GitHub stars at time of writing, single contributor), but the approach is sound. If you're running coding agents against large codebases and watching your token bills climb, it's worth a test run. The MIT license means there's no vendor lock-in if the project stalls.

Five Compression Stages

Related Tools

More from today

LiteLLM PyPI Packages Hijacked with Credential-Stealing Malware

ATLAS: A $500 GPU Setup That Scores 74.6% on LiveCodeBench Using Qwen3-14B

OpenAI Kills Sora Video App, Disney Walks Away from $1B Deal

Cookie Preferences