Models Breaking

Claude Found 22 Firefox Vulnerabilities in Two Weeks, 14 High-Severity

March 6, 2026 2 min read

Image: Anthropic

What Happened

Anthropic partnered with Mozilla to run Claude against the Firefox codebase for two weeks. The result: 22 separate vulnerabilities discovered, with 14 classified as high-severity.

This wasn't a marketing stunt or a cherry-picked demo. Mozilla collaborated directly with Anthropic on a focused security audit, and Claude worked through the Firefox source code systematically over a 14-day window. The specifics of each vulnerability haven't been fully disclosed yet (standard practice while patches roll out), but the high-severity count is notable. Finding 14 serious bugs in a mature, heavily-audited browser codebase is significant output for any security research effort, human or AI.

Firefox is not a small or neglected project. It has decades of security review behind it. The fact that an AI model found this many issues suggests either the codebase had accumulated blind spots that human reviewers had normalized, or Claude is approaching the code from angles that traditional audits miss.

Why It Matters

If you work with AI coding tools, this is the clearest signal yet that AI-assisted security auditing is production-ready. Not "promising" or "showing potential" - actually finding real bugs in real software that real people use.

For developers using Claude Code or similar AI coding assistants, this validates the argument for running AI review passes on your own codebases. If Claude can find high-severity issues in Firefox, it can almost certainly find issues in your less-scrutinized internal tools and applications.

For security teams, this changes the cost equation. A two-week automated audit that surfaces 22 vulnerabilities (including 14 high-severity ones) compresses what might take a human security team significantly longer. That doesn't replace human auditors, but it makes the first pass dramatically cheaper and faster.

The broader implication: AI models are becoming genuinely useful for defensive security work, not just code generation. This is one of the more practically valuable applications we've seen from a frontier model.

Our Take

This is Anthropic making a smart move on two fronts. First, it demonstrates Claude's technical depth in a way that's hard to argue with. Finding real vulnerabilities beats any benchmark score. Second, it positions Claude as a tool for defense, not just productivity - a useful narrative when your competitor just took a Pentagon contract.

For AI tool users, the takeaway is practical: if you're not running AI security reviews on your code, you're leaving bugs on the table. Claude Code already supports code review workflows. The Firefox audit suggests the ceiling for AI-assisted security work is higher than most teams have tested.

We'd like to see Mozilla publish more details once patches ship. The methodology matters as much as the results - knowing how Claude was prompted and directed would help other teams replicate this approach.

What Happened

Why It Matters

Our Take

Related Tools

More from today

Claude Opus 4.6 Cracked Its Own Benchmark by Realizing It Was Being Tested

OpenAI Launches GPT-5.4 With 1M Context Window and 83% Pro Benchmark

Donald Knuth Credits Claude Opus 4.6 With Solving a Math Problem He Was Stuck On

Cookie Preferences