46.5 million chat messages. 728,000 files. 57,000 user accounts. Full read-write access to the production database. An autonomous AI security agent achieved all of this against McKinsey's internal AI platform in under two hours, using a vulnerability class that's been known since the late 1990s.
CodeWall, an offensive security startup, pointed its AI agent at Lilli - McKinsey's internal generative AI tool used by over 40,000 employees and processing more than 500,000 prompts per month. The agent operated with zero credentials, zero insider knowledge, and zero human guidance after target selection. By the end, it had complete control over the platform's database.
A SQL Injection With a Twist
The root cause was SQL injection, one of the oldest and most well-documented web vulnerabilities in existence. But this one had an unusual angle: the API properly parameterized request values (the standard defense), while leaving JSON field names - the keys themselves - concatenated directly into SQL queries without sanitization.
This meant standard security scanners, including OWASP ZAP, missed it entirely. McKinsey's own internal security tools didn't catch it either. The AI agent found it through 15 blind iterations, using error messages that reflected live production data to map the query structure.
Over 200 API endpoints were publicly documented. Twenty-two required no authentication at all.
The Real Danger: Writable System Prompts
The exposed data was staggering on its own: plaintext chat messages covering strategy discussions, M&A activity, client engagements, and financial details. The 3.68 million RAG document chunks (the indexed knowledge base Lilli draws answers from) contained proprietary McKinsey research, frameworks, and methodologies.
But the most alarming finding was that Lilli's 95 system prompts - the instructions that control how the AI responds to every query - were stored in the same compromised database with write access. As CodeWall's researchers put it: "No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call."
A malicious actor could have silently altered what Lilli tells 40,000+ McKinsey consultants. Poisoned financial models, embedded data exfiltration in normal-looking responses, or stripped safety guardrails entirely. No logs, no file changes, no process anomalies to trigger alerts.
McKinsey's Response
The timeline moved fast. CodeWall's agent found the vulnerability on February 28. Responsible disclosure went to McKinsey on March 1. By March 2, McKinsey's CISO had acknowledged the report, patched all unauthenticated endpoints, taken the development environment offline, and blocked public API documentation. CodeWall published its findings on March 9.
McKinsey's official statement said its investigation, supported by a third-party forensics firm, "identified no evidence that client data or client confidential information were accessed by this researcher or any other unauthorized third party."
The Bigger Picture for Enterprise AI
Lilli had been running for two years before this was found. It was built by one of the world's most prominent consulting firms, the kind of organization that advises Fortune 500 companies on technology strategy. If McKinsey's AI platform had a basic SQL injection exposed for two years, the uncomfortable question is what's lurking in the thousands of hastily deployed enterprise AI tools across every other organization.
The specific failure pattern here - securing values but not keys in JSON-to-SQL translation - is exactly the kind of bug that emerges when teams build AI features fast and test against known attack patterns rather than thinking about novel input vectors. Standard scanners check standard things. AI platforms often have non-standard architectures.
CodeWall's CEO Paul Price framed the broader threat: "Hackers will be using the same technology and strategies to attack indiscriminately." An AI agent that can autonomously find and exploit a vulnerability in a major consulting firm's production system in two hours represents a real shift in what's possible for attackers. The defense side needs to move just as fast.