After a year of running under its first-ever Responsible Scaling Policy (RSP), Anthropic has published a revised version that adds clearer capability thresholds, fixes procedural gaps the company admitted to, and puts a new face in charge of enforcement.
The RSP is Anthropic's internal rulebook for when it can and cannot train or deploy a new model. The core commitment: if a model crosses certain danger thresholds, it must be put behind stricter safeguards before it ships. Anthropic grades its models on a scale called AI Safety Levels, or ASL. All current Claude models sit at ASL-2, meaning they meet baseline safety standards. ASL-3 and above kick in when a model can do things considered genuinely dangerous.
The Two Triggers That Require ASL-3 or Higher
The updated policy published on Anthropic's blog names two specific capability thresholds that would force Anthropic to apply heavier safeguards before releasing a model:
- CBRN weapons assistance: If a model can meaningfully help someone with a basic technical background create or deploy chemical, biological, radiological, or nuclear weapons, it must meet ASL-3 standards. Those include tighter internal access controls, protection of the model's underlying weights (the parameters that encode what the model knows), real-time monitoring, and pre-deployment red-team testing.
- Autonomous AI R&D: If a model can independently conduct complex AI research tasks that normally require a human expert, it could require ASL-4 protections or higher. The concern here is a model that accelerates AI development in ways its makers can't predict or control.
What Changed From Year One
Anthropuc acknowledged that its first year under the previous RSP had procedural problems - 3-day evaluation delays, unclear documentation procedures, and missed optimizations in standard evaluation processes. The company stated these issues posed minimal actual safety risk, but used them to justify building more flexibility and clearer compliance tracking into the new version.
Jared Kaplan, co-founder and Chief Science Officer, takes over as Responsible Scaling Officer from co-founder Sam McCandlish, who remains CTO. Anthropic is also hiring a Head of Responsible Scaling to coordinate day-to-day implementation.
On transparency: the company will publish summaries of each capability assessment at anthropic.com/rsp-updates and has shared its evaluation methodology with both the US and UK AI Safety Institutes.
For everyday users this changes nothing about how Claude works today. The real audience for this document is enterprise customers doing due diligence on AI vendors, policymakers watching how frontier labs self-regulate, and the AI research community tracking where voluntary safety commitments are heading.