Related ToolsClaudeChatgptClaude CodeCursor

AgentSeal Scans AI Agents for Prompt Injection With 150+ Attack Probes

AI news: AgentSeal Scans AI Agents for Prompt Injection With 150+ Attack Probes

What Happened

AgentSeal, an open-source security scanner for AI agents, launched on GitHub on March 6. The tool ships with 150+ base attack probes split into two categories: 70 extraction probes that try to trick agents into leaking their system prompts, and 80 injection probes that test whether agents will follow malicious instructions.

A Pro tier expands the probe count to roughly 301, adding 26 MCP tool poisoning probes, 20 RAG poisoning probes, and about 105 behavioral genome mapping probes.

The scanner works with OpenAI (GPT-4o), Anthropic (Claude), Ollama for local models, LiteLLM proxy, and custom HTTP endpoints. It runs in Python 3.10+ or Node.js 18+, and plugs into CI/CD pipelines via GitHub Actions with exit code enforcement.

One notable design choice: AgentSeal uses deterministic pattern matching instead of an AI judge to evaluate responses. That means results are reproducible across runs and don't cost extra API calls.

Output includes a trust score on a 0-100 scale, extraction and injection resistance ratings, boundary integrity assessment, and specific vulnerability identification with remediation suggestions.

Why It Matters

If you're building AI agents - whether for internal tools, customer support, or coding assistants - you probably haven't tested how they respond to adversarial inputs. Most teams ship agents with whatever guardrails the model provider offers and hope for the best.

AgentSeal addresses a real gap. Prompt injection remains the most common attack vector against LLM-based systems, and there hasn't been a standardized way to benchmark agent resilience. Having 150 probes you can run in a CI pipeline means security testing becomes part of the development loop rather than an afterthought.

The MCP tool poisoning probes are particularly relevant right now. As MCP adoption grows across coding tools and productivity apps, the attack surface around tool definitions is expanding fast. Testing whether your agent blindly trusts tool descriptions is something most teams aren't doing yet.

Our Take

AgentSeal fills a gap that's been obvious for a while. We've seen plenty of "red teaming" frameworks for LLMs, but most focus on content safety rather than the mechanical exploits that matter for deployed agents - system prompt extraction, instruction override, tool poisoning.

The deterministic scoring (no AI judge) is a smart call. AI-judged evaluations introduce their own failure modes and make it hard to track improvements between releases.

The free/Pro split is worth watching. The base 150 probes cover the fundamentals, but MCP and RAG poisoning probes are locked behind Pro. Given how central those attack vectors are becoming, it would be better if at least some of those probes were in the free tier.

If you're deploying agents to production, run this in your CI pipeline. A trust score before every deploy beats finding out from a user that your agent leaks its system prompt.