Research Notable

Palo Alto Networks Finds Prompt Injection Attacks Actively Weaponized Across the Web

March 10, 2026 3 min read

Prompt injection (hiding instructions in content that trick AI into doing something unintended) has jumped from security conference demos to active use by real attackers. Palo Alto Networks' Unit 42 threat research team published findings documenting widespread indirect prompt injection attacks embedded in ordinary web pages, specifically designed to manipulate AI agents that browse and process web content.

This is not a theoretical exercise. The researchers found live attacks in the wild, targeting real systems.

The First Confirmed Case: Bypassing Ad Moderation

The most striking example dates to December 2025. A site called reviewerpress.com used 24 separate prompt injection techniques simultaneously to bypass AI-powered ad moderation systems. The goal was mundane but profitable: pushing a scam ad for military glasses with fake discounts past automated review. This is the first documented case of prompt injection being used successfully to defeat a commercial AI moderation system.

Other confirmed attacks included SEO poisoning to promote phishing sites, attempts to destroy databases, trigger unauthorized purchases, and leak sensitive information from AI systems.

How the Attacks Hide

Unit 42 identified 22 distinct techniques attackers use to embed malicious instructions in web pages. The most common approaches:

Visible plaintext (37.8% of cases) - instructions sitting right on the page, often ignored by human visitors but consumed by AI crawlers
HTML attribute cloaking (19.8%) - hiding instructions in HTML tags that browsers render but humans don't see
CSS rendering suppression (16.9%) - text that exists in the page source but CSS makes invisible on screen
Zero-size text - font-size set to 0px, invisible to humans, readable by AI
Unicode tricks - using invisible characters or swapping Cyrillic letters for Latin ones to dodge keyword filters

Social engineering framing appeared in 85.2% of attacks. The injected text might claim to be from a "developer" or invoke "god mode," giving the AI what looks like authority to override its instructions.

What This Means for Anyone Using AI Agents

About 75.8% of the malicious pages contained a single injection payload. The remaining 24.2% stacked multiple techniques, making detection harder.

The attacks overwhelmingly target .com domains (73.2%), which makes sense since that is where most commercial AI agents operate. The intended outcomes ranged from low-severity disruptions like generating nonsensical output (28.6% of cases) to high-severity attacks like triggering unauthorized transactions or destroying data (14.2%).

The practical concern is straightforward: any AI tool that reads web pages on your behalf - research assistants, content summarizers, autonomous browsing agents - is a potential target. The AI does not "see" a web page the way you do. It processes raw HTML, and attackers are now actively planting instructions in that HTML to hijack AI behavior.

Unit 42 recommends separating untrusted web content from trusted instructions in AI systems, along with adversarial training and intent analysis. But there is no silver bullet yet. The attack surface exists because AI agents are designed to process content from sources they do not control, and that fundamental design tension is not going away.

The First Confirmed Case: Bypassing Ad Moderation

How the Attacks Hide

What This Means for Anyone Using AI Agents

Related Tools

More from today

The Case That AI Coding Agents Are Killing Software Libraries

Security Researchers Claim Prompt Injection Gave Root Access to Meta AI

Simon Willison: AI Coding Agents Should Kill Technical Debt, Not Create It

Cookie Preferences