Ninety-one confirmed vulnerabilities across 40+ open-source repositories. That's the scorecard for GitHub Security Lab's new AI-powered scanning framework, which is now open source and free to use.
The tool, called the Taskflow Agent, takes a different approach from traditional static analysis scanners like CodeQL. Instead of matching known vulnerability patterns in code, it uses a large language model (an LLM - the same type of AI that powers ChatGPT and Copilot) to think more like a human security auditor. It reads code, builds a threat model, proposes potential vulnerabilities, then audits its own suggestions in a separate pass to filter out false positives.
How the Three-Stage Process Works
The framework runs in three distinct phases:
- Threat modeling - The agent breaks a repository into functional components, maps entry points, and identifies how the application handles user input.
- Issue suggestion - The LLM proposes vulnerability types likely present in each component. These are treated as unverified leads, not conclusions.
- Audit - A fresh context window reviews each suggestion against strict criteria, marking only verified issues as real vulnerabilities.
This staged design is specifically meant to reduce hallucinations - cases where the AI confidently reports bugs that don't actually exist. By separating the "brainstorm" and "verify" steps, the agent essentially double-checks its own work.
The Numbers
Across the 40+ repositories tested, the agent suggested 1,003 potential issues. After the audit stage, 139 survived. After deduplication, 91 unique vulnerabilities remained, with 21% rated high or critical severity.
The real proof is in the CVEs. The framework discovered over 80 disclosed vulnerabilities in production software, including a privilege escalation bug in Outline (the wiki tool), an information disclosure flaw in WooCommerce, and an authentication bypass in Rocket.Chat caused by an unhandled JavaScript Promise.
The agent performs strongest on logical vulnerabilities - things like insecure direct object references (where you can access another user's data by changing an ID in a URL), broken authentication flows, and business logic errors. These are exactly the bug classes that traditional pattern-matching scanners tend to miss, because they require understanding what the code is supposed to do, not just what it does.
SQL injection and memory safety bugs showed lower detection rates, which makes sense. Those are the categories where CodeQL and similar tools already excel.
What You Need to Run It
The framework is split across two GitHub repositories: seclab-taskflows (the vulnerability detection workflows) and seclab-taskflow-agent (the agent runtime). You'll need a GitHub Copilot license, since the agent runs on premium model requests through Copilot's infrastructure.
That Copilot requirement is the main friction point. This isn't a fully standalone tool you can point at any codebase without a GitHub account. But for teams already paying for Copilot, it's a substantial addition to their security tooling at no extra cost.
For developers running open-source projects, this is a practical tool worth trying. The 91-vulnerability track record across major projects like WooCommerce and Rocket.Chat suggests it catches real bugs, not just theoretical ones.