Tools Notable

Inside Claude Code: How Anthropic Built the Plumbing Behind Its AI Coding Agent

April 3, 2026 2 min read

Image: Anthropic

"The model is the easy part. The plumbing is the product." That line, from a recent technical teardown of Claude Code's architecture, captures something most AI tool users never think about: the engineering that sits between you typing a command and getting useful code back.

A detailed analysis of Claude Code's codebase reveals a TypeScript application running on Bun (a fast JavaScript runtime) with React Ink for the terminal interface, over 40 tool implementations, and several clever systems that keep costs down and security tight.

Prompt Caching Cuts Costs by 90%

Every time Claude Code sends a request to Anthropic's API, it splits the prompt into two parts: static content (identity rules, tool descriptions) that rarely changes, and dynamic content (your project's memory, MCP instructions) that changes per session. The static portion gets cached, and on cache hits, Anthropic charges roughly 90% less.

The engineering team discovered that 77% of cache misses came from tool descriptions changing between requests, not from tools being added or removed. Small wording differences in how sub-agent tools described themselves were enough to bust the cache. Fixing that one pattern made a measurable cost difference.

Five Layers of Security for Shell Commands

Claude Code can run shell commands on your machine, which makes its security architecture critical. The system uses five validation layers packed into a 2,592-line security file:

Pattern matching blocks 13 categories of dangerous shell syntax
Shell-specific blocking prevents 20+ dangerous built-in commands, including obscure ones like zmodload (which can enable invisible file operations) and ztcp (which could exfiltrate data over TCP)
AST parsing using Tree-sitter analyzes the actual syntax tree when regex-based checks are ambiguous
OS-level sandboxing restricts filesystem and network access at runtime
Enterprise permission rules let administrators lock tools to read-only or restrict which directories can be modified

Context Management and Parallelization

When conversations get long, Claude Code compresses earlier messages using a rigid 9-section template that preserves user messages verbatim rather than paraphrasing them. A circuit breaker stops compression attempts after three consecutive failures. Before that safeguard existed, data showed 1,279 sessions with 50+ consecutive compression failures, wasting an estimated 250,000 API calls daily.

For parallel tasks, Claude Code forks sub-agents that inherit the parent's exact tool pool and model. Using the same model is deliberate: it means the forked agents share the same prompt cache, so parallel work does not multiply costs the way you might expect.

Large tool outputs (over 50,000 characters) get written to disk instead of truncated. The model receives a file path and can read the full output later. It is a simple choice, but it means you never lose information to arbitrary character limits.

The teardown reinforces something practical: when evaluating AI coding tools, the model powering them matters less than you think. What matters is how well the tool manages cost, security, context, and reliability around that model. Claude Code's architecture shows real engineering depth on all four fronts.

Prompt Caching Cuts Costs by 90%

Five Layers of Security for Shell Commands

Context Management and Parallelization

Related Tools

More from today

Running the Same Prompt Through 6 AI Tools Beats Trusting Any Single One

Skyvern's Open-Source MCP Server Lets Claude QA Its Own Code Changes

A Doctor Used AI Scribes for 18 Months, Then Quit. His Reasons Are Worth Reading.

Cookie Preferences