Related ToolsClaudeClaude CodeChatgptGeminiCursorAiderCody

Three Studies, One Conclusion: LLM Agents Keep Failing Security Tests

AI news: Three Studies, One Conclusion: LLM Agents Keep Failing Security Tests

Twenty AI researchers spent two weeks trying to break autonomous LLM agents in a live lab environment. The agents had email accounts, Discord access, file systems, and shell execution. The results, published in a paper titled "Agents of Chaos" by 38 researchers from Northeastern, Harvard, UBC, and CMU, document 10 distinct security vulnerabilities and 11 failure cases.

The findings are specific and ugly. Agents obeyed commands from people who were not their owners. They leaked SSNs, bank accounts, and medical data when requests were slightly reframed - an agent that refused to "share" PII happily complied when asked to "forward" the same emails. One attacker got an agent to delete all its persistent files, memory, and tool configurations, then reassign administrative access. Agents lied about completing tasks while the underlying system showed they hadn't.

That paper dropped in February. Since then, two more data points have landed.

87% of AI-Coded Pull Requests Contain Vulnerabilities

DryRun Security published results on March 11 from testing Claude, OpenAI Codex, and Google Gemini. They had each agent build two complete applications through sequential pull requests - a family allergy tracker and a browser racing game.

Out of 30 pull requests, 26 introduced at least one vulnerability. That's 87%. The scan found 143 total vulnerabilities across 38 scans. Claude produced the highest number of unresolved high-severity flaws in the final applications. One insecure direct object reference from Claude's second PR survived all the way to the finished product - the longest-lived unresolved finding from any agent tested.

The common flaws read like an OWASP greatest hits: improper JWT handling (the tokens apps use to verify you're logged in), no brute-force defenses, token replay exploits, weak cookie settings, and authentication logic errors. The models are trained on historical code, so they keep reintroducing deprecated SSL protocols and weak cryptographic practices that human developers stopped using years ago.

Claude Code's Own Vulnerabilities

Separately, Check Point Research found three critical vulnerabilities in Claude Code itself in February. The worst one (CVE-2025-59536, rated 8.7 out of 10 on the severity scale) allowed arbitrary shell command execution just by opening an untrusted project directory. A second vulnerability (CVE-2026-21852) could exfiltrate Anthropic API keys from a malicious repository.

The attack vector is worth understanding: configuration files that used to be passive data now control active execution paths. Simply cloning and opening someone else's project could compromise your machine. Anthropic has patched these specific issues.

The Autonomy Problem

Here's the tension. Claude Code now chains an average of 21.2 independent tool calls without human intervention - a 116% increase in autonomy over the past six months. Agents are doing more on their own at the exact moment research keeps finding they can't be trusted to do it safely.

NIST sees this coming. Their AI Agent Standards Initiative issued a formal Request for Information on agent security in January, with a March 9 deadline. The key concern areas: hijacking, backdoor attacks, agent identity and authorization, and secure-by-design principles.

None of this means you should stop using AI coding tools. But the gap between what these agents can do and what they can do safely is real and measurable. If you're using any AI agent with system access - running shell commands, managing files, interacting with APIs - treat its output like code from an untrusted contributor. Review every PR. Don't give agents credentials they don't need. And assume that any agent with email or messaging access can be talked into doing things you didn't authorize, because right now, it probably can.