Policy Notable

Anthropic Updates Its Responsible Scaling Policy With New Danger Thresholds

May 3, 2026 2 min read Source: Anthropic Blog

A hand writing on a policy document with a feathered quill.

Image: Anthropic

After a year of running under its first-ever Responsible Scaling Policy (RSP), Anthropic has published a revised version that adds clearer capability thresholds, fixes procedural gaps the company admitted to, and puts a new face in charge of enforcement.

The RSP is Anthropic's internal rulebook for when it can and cannot train or deploy a new model. The core commitment: if a model crosses certain danger thresholds, it must be put behind stricter safeguards before it ships. Anthropic grades its models on a scale called AI Safety Levels, or ASL. All current Claude models sit at ASL-2, meaning they meet baseline safety standards. ASL-3 and above kick in when a model can do things considered genuinely dangerous.

The Two Triggers That Require ASL-3 or Higher

The updated policy published on Anthropic's blog names two specific capability thresholds that would force Anthropic to apply heavier safeguards before releasing a model:

CBRN weapons assistance: If a model can meaningfully help someone with a basic technical background create or deploy chemical, biological, radiological, or nuclear weapons, it must meet ASL-3 standards. Those include tighter internal access controls, protection of the model's underlying weights (the parameters that encode what the model knows), real-time monitoring, and pre-deployment red-team testing.
Autonomous AI R&D: If a model can independently conduct complex AI research tasks that normally require a human expert, it could require ASL-4 protections or higher. The concern here is a model that accelerates AI development in ways its makers can't predict or control.

What Changed From Year One

Anthropuc acknowledged that its first year under the previous RSP had procedural problems - 3-day evaluation delays, unclear documentation procedures, and missed optimizations in standard evaluation processes. The company stated these issues posed minimal actual safety risk, but used them to justify building more flexibility and clearer compliance tracking into the new version.

Jared Kaplan, co-founder and Chief Science Officer, takes over as Responsible Scaling Officer from co-founder Sam McCandlish, who remains CTO. Anthropic is also hiring a Head of Responsible Scaling to coordinate day-to-day implementation.

On transparency: the company will publish summaries of each capability assessment at anthropic.com/rsp-updates and has shared its evaluation methodology with both the US and UK AI Safety Institutes.

For everyday users this changes nothing about how Claude works today. The real audience for this document is enterprise customers doing due diligence on AI vendors, policymakers watching how frontier labs self-regulate, and the AI research community tracking where voluntary safety commitments are heading.

Source

Anthropic Blog Responsible Scaling Policy →

The Two Triggers That Require ASL-3 or Higher

What Changed From Year One

Source

Related Tools

More from today

X Now Labels AI-Generated and AI-Edited Photos Automatically

Artisan AI Accused of Using 'This Is Fine' Art Without Permission

Harvard Study: AI Outdiagnosed Human ER Doctors on Real Emergency Cases

Cookie Preferences