80,000 tokens of source code per query was producing one consistent result in a developer's codebase: Claude confidently suggesting functions that didn't exist anywhere in the project.
A token is roughly three-quarters of a word, so 80,000 tokens represents something close to a 200-page technical document. Modern AI models can technically accept that much text in their context window - the amount of text a model can process at once - but handling a large input and producing reliable output are different things.
The fix was counterintuitive: instead of sending full source code, strip it down to function signatures and type definitions only. The skeleton of the code, not the body. That change alone cut the context from 80,000 tokens to 2,000 - a 97% reduction.
The Results Across 18 Codebases
Testing across 18 real production repositories showed:
- Hallucinations (Claude suggesting functions that don't exist): 92% down to 0%
- Tokens per query: 80,000 down to 2,000
The underlying problem has a name in the research literature: "lost in the middle." Studies on large language models have shown that models pay disproportionate attention to content near the start and end of a long input - details buried in the middle get less weight. Feed an AI the full contents of a large codebase and it fills gaps from its training data, generating function names that follow the project's naming conventions and sound plausible but don't actually exist. That's what makes them dangerous: they're wrong in ways that look right.
The Skeleton Approach in Practice
Sending only function signatures means Claude sees something like processPayment(amount: number, userId: string): Promise<Receipt> but not the 60-line implementation behind it. That's enough to understand the API surface - what's available, how to call it, what it returns - without the surrounding noise.
For anyone using Claude Code or a Claude-based coding assistant who has been getting references to methods they don't recognize, the diagnostic is simple: check how many tokens you're sending per request. Most AI coding tools surface this number somewhere in their interface.
The practical approach: build a "skeleton layer" of your codebase - exported function signatures, type definitions, interface declarations - and use that as your default context for architecture questions, navigation, and new feature work. Load full implementations only when you need the model to reason about a specific piece of logic directly.
The tradeoff is real. A stripped context means Claude can't see implementation details that matter for tasks like debugging a specific function or understanding a complex algorithm. But for the majority of day-to-day coding interactions - figuring out what to call, where to put something, how to structure a new feature - the skeleton approach consistently outperforms the full codebase dump. Eighteen repositories and a 92-point drop in hallucinations make a compelling case.