Policy Notable

AISI Tests Claude Mythos Preview for Offensive Cyber Capabilities

April 13, 2026 2 min read

Image: Anthropic

What can a powerful AI model actually do for someone trying to break into computer systems? That's the question the UK's AI Safety Institute set out to answer with its evaluation of Claude Mythos Preview's cyber capabilities.

AISI - the UK government body responsible for testing frontier AI models before public release - published findings on Anthropic's Claude Mythos Preview, a pre-release version of what appears to be a new Claude model family. The evaluation focuses specifically on whether the model provides meaningful "uplift" to people attempting offensive cyber operations. Uplift means: does AI bridge the gap between someone having bad intentions and someone having the skills to carry them out? An AI that knows everything about cybersecurity but refuses to help is low uplift. One that walks a script kiddie through a real intrusion is high uplift.

How These Evaluations Actually Work

Uplift testing is different from standard benchmark testing. Researchers don't just ask whether a model knows about malware or can explain SQL injection (a technique for manipulating databases through malicious input). They test whether it helps people actually accomplish harmful tasks under realistic conditions - often with participants of varying skill levels trying to complete offensive security scenarios with and without AI assistance. The delta between what attackers can do with and without the model is the uplift number.

AISI has been running these evaluations on frontier models since late 2023, working directly with Anthropic, OpenAI, Google DeepMind, and others. Pre-deployment access - meaning AISI gets the model before it goes public - was initially agreed to voluntarily by these companies. The UK is working toward making such evaluations mandatory under upcoming AI regulation.

Previous AISI evaluations found that current models provide some uplift to less-skilled attackers on basic tasks but don't dramatically accelerate the capabilities of already-skilled threat actors. The more consistent finding across evaluations is that AI models are useful for reconnaissance and planning phases - understanding target systems, drafting attack approaches - even when they decline to write actual exploit code.

The "Mythos" Name Is Itself News

The publication of an evaluation for "Claude Mythos Preview" is the first public confirmation that Anthropic is developing a model under this name. It doesn't match any existing Claude line - Claude 3, 3.5, 3.7, or Claude 4 - suggesting this could be a specialized or experimental variant, possibly built around extended reasoning or agentic use cases.

Claude already applies stricter safety filtering than many competitors through Anthropic's Constitutional AI approach. But AISI's evaluations test whether those filters hold under adversarial pressure - whether a determined person can systematically work around them through rephrasing, role-play framing, or multi-step prompting rather than a direct request.

For anyone building on Claude or working in security, the full evaluation is worth reading directly on AISI's blog rather than relying on a summary. These reports tend to include specific scenarios and task categories that reveal where model safety measures actually hold up and where the gaps are.

How These Evaluations Actually Work

The "Mythos" Name Is Itself News

Related Tools

More from today

70+ Groups Warn Meta: Facial Recognition Glasses Put Vulnerable People at Risk

Major News Outlets Are Blocking the Wayback Machine

Microsoft Building Secure Enterprise Agent to Rival OpenClaw

Cookie Preferences