Safety training shapes what Claude wants to do. Containment shapes what Claude can do.
Anthropic's engineering team published a breakdown of how the company technically limits Claude's capabilities across its product line - from Claude.ai and the API to Claude Code and third-party operator builds. The piece treats containment as a distinct engineering discipline, separate from the model-level safety work that usually dominates discussions about responsible AI.
The Gap Between "Won't" and "Can't"
A language model trained to avoid harmful outputs will generally avoid them. But "generally" isn't the same as "always," especially as Claude takes on longer, more autonomous tasks - running code, calling APIs, managing files, browsing websites. Training makes bad outcomes unlikely. Technical containment makes specific actions impossible regardless of what the model is asked to do.
This matters more as Claude's capabilities have grown. Earlier Claude models were primarily text in, text out - the blast radius of any given response was limited to what a human then chose to do with it. Agentic Claude is different. When Claude Code is rewriting a production codebase, or Claude is clicking through a website on your behalf, the model's decisions have immediate real-world effects. One misdirected action can delete files, submit a form, or push code to a repository.
There's also the prompt injection problem: when Claude browses a webpage or reads a document, it may encounter text designed to redirect its behavior - a malicious website that tells Claude to "ignore your instructions and do X instead." No amount of training fully prevents a well-crafted injection from influencing a model's behavior. Containment limits how much damage an injection can cause even if it partially works.
How Anthropic Structures the Limits
The containment architecture works in tiers. Operators - companies and developers building on the Claude API - set system prompts defining what Claude can and can't do in their product. Users interact within whatever limits the operator sets. Anthropic sits above both, enforcing policies that operators cannot override regardless of what they write in a system prompt.
This is why an operator can tell Claude to stay on topic for their customer service tool, but cannot instruct Claude to help users plan violence. Some limits are hardcoded at Anthropic's level and don't move.
Beyond policy, there's technical sandboxing. Claude Code executes code in isolated environments with limited access to the broader system. Computer use runs in a contained desktop session. Tools Claude can access - file systems, browsers, external APIs - have explicit permission scopes.
For developers building with Claude, the practical implication is that the tools you give your deployment access to are a consequential design decision. Giving Claude write access to a database is not a neutral choice. The containment architecture Anthropic has built is partly for their safety and partly for yours.