Research Notable

AI Models Tested on Phishing Attacks. The Results Are Uncomfortable.

April 22, 2026 2 min read

What happens when a researcher deliberately asks five different AI models to run a social engineering attack? In a hands-on test published by Wired, the answer was unsettling: several models succeeded in ways that made cybersecurity professionals uncomfortable, and some of the attacks were effective enough to alarm people who study this for a living.

The test covered both phishing (attempts to trick someone into handing over credentials or clicking a malicious link) and broader social engineering (using psychological pressure or false pretenses to get someone to act). These aren't new attack types. What's changed is who - or what - can execute them convincingly.

Current Models Don't Write Like Bots Anymore

For years, AI-assisted phishing emails were easy to spot: generic language, wrong names, obviously templated structure. That advantage is gone. Today's large language models write naturally, adapt their tone to a specific target, and can hold a multi-turn conversation - meaning they can engage across several exchanges rather than sending one generic message and hoping for the best.

The Wired test found that some models didn't just produce a convincing phishing email. They could carry on a contextually appropriate conversation designed to build trust before making their actual move. That's structurally different from a one-shot email, because it mimics how a skilled human attacker actually operates: slowly, patiently, building a plausible relationship before asking for anything.

Why Defenders Are Worse Off Than They Look

The cybersecurity industry has focused heavily on AI's ability to write malicious code - generating exploits, scanning for vulnerabilities, automating technical attacks. The social layer has gotten less attention, and the Wired findings suggest that's a problem.

Standard email security tools catch known malicious links and suspicious domains reasonably well. They're not designed to detect a conversation engineered to manipulate over time. An AI that spends three exchanges building rapport with an employee before requesting something leaves almost no technical footprint to detect.

Organizations that train staff to spot phishing based on visual cues - typos, odd formatting, mismatched sender names - are training for an older threat model. Current AI models can scrape public information about a company, write in the correct internal register, and handle follow-up questions without breaking character.

For people using ChatGPT or Claude for work, this doesn't change your daily routine. But it does mean the old heuristic - "this message sounds specific and natural, so it's probably from a real person" - no longer holds. Smooth, contextually aware writing stopped being evidence of human authorship some time ago.

Current Models Don't Write Like Bots Anymore

Why Defenders Are Worse Off Than They Look

Related Tools

More from today

One in Five Show HN Projects Uses 5+ AI Design ClichÃ©s, New Analysis Finds

SpaceX Agrees to Potential $60B Deal to Acquire Cursor

Anthropic's Dangerous Cybersecurity Model Accessed by Unauthorized Group

Cookie Preferences