Anthropic has published an updated version of its Responsible Scaling Policy (RSP), the internal framework governing what safety measures its AI models must meet before they can be trained or deployed. The update sharpens the AI Safety Level system and defines specific capability thresholds that trigger stronger protections.
The core rule is simple: Anthropic won't train or deploy models unless safety measures keep risks below defined acceptable levels. The update adds more structure around how those levels are defined and measured.
The ASL Tiers
The policy organizes model safety into numbered AI Safety Levels. ASL-1 covers basic systems with minimal capabilities, like a chess bot. ASL-2 represents current industry best practices - all of Anthropic's existing models, including every version of Claude, operate here.
ASL-3 kicks in when a model could meaningfully help someone with basic technical knowledge create chemical, biological, radiological, or nuclear (CBRN) weapons. Requirements at this level include real-time monitoring, pre-deployment red teaming (where researchers systematically try to make the model produce harmful outputs), and rapid response protocols.
ASL-4 and above would apply to models capable of autonomous AI research - systems that could independently conduct complex research tasks normally requiring human expertise. Anthropic flags this as a critical threshold because self-improving AI systems could accelerate development in ways that are difficult to predict or control.
Self-Policing, But With Receipts
The policy requires routine capability assessments, documented safeguards, and both internal stress-testing and external review shared with AI Safety Institutes. Anthropic also disclosed that during the RSP's first year, the company fell short in a few instances - completing some evaluations three days late and running into clarity issues with certain assessments. Publicly admitting your own compliance gaps is uncommon in this industry.
Jared Kaplan, Anthropic's co-founder and Chief Science Officer, has taken on the role of Responsible Scaling Officer. The company is also hiring a dedicated Head of Responsible Scaling to coordinate execution across teams.
For current Claude users, nothing changes immediately. Today's models already meet ASL-2 requirements. The policy is forward-looking - establishing guardrails before more capable models ship rather than scrambling to add them after.
Anthropic explicitly positions the RSP as something other companies can borrow from, stating its goal is to offer "an example of a framework that others might draw inspiration from." That framing matters because there's still no binding US regulatory framework for frontier AI safety. Voluntary policies like this - and how honestly companies report their own shortcomings - are the primary accountability mechanism the industry currently has.