Related ToolsClaude

Anthropic's Dangerous Cybersecurity Model Accessed by Unauthorized Group

Anthropic
Image: Anthropic

A small group of unauthorized users gained access to Anthropic's Mythos model - a cybersecurity-focused AI the company had flagged as potentially dangerous - according to Bloomberg, citing an unnamed third-party contractor who was part of the group.

Mythos isn't a consumer product like Claude. It's a specialized cybersecurity model, and Anthropic had apparently warned internally that it posed serious risks outside of controlled settings. The breach reportedly involved someone with legitimate contractor access enabling or sharing that access with others through a private online forum.

What Mythos Actually Is

Anthropic has not publicly detailed Mythos's capabilities, but describing a cybersecurity AI as "dangerous in the wrong hands" points to offense-oriented features: finding vulnerabilities in software, generating working exploits, or conducting security analysis at a depth that crosses from defensive research into genuinely harmful territory. These are the kinds of capabilities AI labs have quietly debated whether to build at all, let alone how to restrict.

The scope of what the unauthorized group was able to do with the model - what they queried, what outputs they obtained, whether any data was extracted - has not been disclosed. Bloomberg's report, sourced to a single unnamed individual, leaves significant gaps in the picture.

The Third-Party Access Problem

This is a structural vulnerability that shows up repeatedly in AI security incidents. Companies build strict internal access controls for their most sensitive models, then extend partial trust to contractors, research partners, or vendors. Those third parties don't always hold the line.

Anthropic's own threat modeling for Mythos was written under the assumption that access would stay within a controlled group. That assumption broke down at the contractor level, not at Anthropic's own perimeter.

For the AI industry broadly, this is a real-world stress test of "responsible scaling" policies - the framework AI labs use to justify building potentially dangerous capabilities under the assumption that access can be managed. Mythos is the first publicly reported case where a model an AI lab explicitly assessed as dangerous has ended up in unauthorized hands. Whether that access caused any actual harm is still unknown.