"The model is the easy part. The plumbing is the product." That line, from a recent technical teardown of Claude Code's architecture, captures something most AI tool users never think about: the engineering that sits between you typing a command and getting useful code back.
A detailed analysis of Claude Code's codebase reveals a TypeScript application running on Bun (a fast JavaScript runtime) with React Ink for the terminal interface, over 40 tool implementations, and several clever systems that keep costs down and security tight.
Prompt Caching Cuts Costs by 90%
Every time Claude Code sends a request to Anthropic's API, it splits the prompt into two parts: static content (identity rules, tool descriptions) that rarely changes, and dynamic content (your project's memory, MCP instructions) that changes per session. The static portion gets cached, and on cache hits, Anthropic charges roughly 90% less.
The engineering team discovered that 77% of cache misses came from tool descriptions changing between requests, not from tools being added or removed. Small wording differences in how sub-agent tools described themselves were enough to bust the cache. Fixing that one pattern made a measurable cost difference.
Five Layers of Security for Shell Commands
Claude Code can run shell commands on your machine, which makes its security architecture critical. The system uses five validation layers packed into a 2,592-line security file:
- Pattern matching blocks 13 categories of dangerous shell syntax
- Shell-specific blocking prevents 20+ dangerous built-in commands, including obscure ones like
zmodload(which can enable invisible file operations) andztcp(which could exfiltrate data over TCP) - AST parsing using Tree-sitter analyzes the actual syntax tree when regex-based checks are ambiguous
- OS-level sandboxing restricts filesystem and network access at runtime
- Enterprise permission rules let administrators lock tools to read-only or restrict which directories can be modified
Context Management and Parallelization
When conversations get long, Claude Code compresses earlier messages using a rigid 9-section template that preserves user messages verbatim rather than paraphrasing them. A circuit breaker stops compression attempts after three consecutive failures. Before that safeguard existed, data showed 1,279 sessions with 50+ consecutive compression failures, wasting an estimated 250,000 API calls daily.
For parallel tasks, Claude Code forks sub-agents that inherit the parent's exact tool pool and model. Using the same model is deliberate: it means the forked agents share the same prompt cache, so parallel work does not multiply costs the way you might expect.
Large tool outputs (over 50,000 characters) get written to disk instead of truncated. The model receives a file path and can read the full output later. It is a simple choice, but it means you never lose information to arbitrary character limits.
The teardown reinforces something practical: when evaluating AI coding tools, the model powering them matters less than you think. What matters is how well the tool manages cost, security, context, and reliability around that model. Claude Code's architecture shows real engineering depth on all four fronts.