Related ToolsClaude CodeClaude

Claude Code Pro Limits: One Developer's $0.02-Per-Call Workaround

Claude by Anthropic
Image: Anthropic

Hitting the Claude Code Pro usage limit by Wednesday every week gets old fast. One developer tried the standard fixes - switching to Sonnet for simple tasks, tighter prompts, the /compact command to shrink the active context - and still ran out mid-week.

The solution: a two-model setup where a cheap model handles the grunt work and Claude handles the actual reasoning.

The implementation uses CLI scripts that delegate bulk file reading and boilerplate code generation to Kimi K2.5, currently one of the cheapest capable models at around $0.02 per API call. Claude Code calls these scripts via its built-in Bash tool, so the handoff happens inside the same workflow. A CLAUDE.md file - the config file that governs how Claude behaves in a project - defines routing rules: send large file reads, directory scans, and repetitive code generation to the cheap model; save Claude's processing budget for architecture decisions, debugging, and anything that needs real judgment.

After three weeks, the developer reports staying comfortably under the Pro limit even on heavy coding days.

This pattern - sometimes called "model routing" - isn't a new concept, but this is a low-overhead implementation that doesn't require building a full orchestration system. Any cheap model with a large context window (the amount of text a model can read at once) fits the same role: Gemini Flash, GPT-4o mini, and locally-run open-source models all work as the delegation target.

The main cost is discipline: you have to be honest about which tasks actually need Claude's reasoning versus which ones just need a model that can read a lot of text quickly. For most codebases, file scanning and template generation fall clearly into the second category.