Four terabytes. That's how much data hackers claim to have stolen from Mercor, the $10 billion AI recruiting startup that supplies training data and expert talent to OpenAI, Anthropic, and Meta.
The breach, confirmed by Mercor in late March, didn't come from a direct attack on the company. It came through the supply chain - specifically through LiteLLM, a popular open-source library that developers use to connect their applications to different AI services. Attackers poisoned the library's code, and when Mercor's developers installed it, the malware harvested internal credentials and opened the door to the company's infrastructure.
According to Wired, Meta has paused its work with Mercor while it investigates the breach. That's a significant move given Mercor's role as a key data vendor for the biggest AI labs in the industry.
What Was Actually Stolen
The stolen data breaks down into three categories, and each one is worse than the last.
Candidate records (211GB): Resumes, verified contact information, and Social Security numbers from people who applied through Mercor's platform.
Video and identity verification (3TB): High-definition recordings of candidate interviews, passport and driver's license scans, and facial biometric data used for identity matching. This is the bulk of the stolen data and by far the most sensitive.
Source code (939GB): Mercor's proprietary matching algorithms, internal dashboards, benchmarking code, and - critically - hardcoded API keys that could give attackers further access to Mercor's cloud systems.
How the Attack Chain Worked
The attackers, linked to a group called TeamPCP, didn't go after Mercor directly. They executed a two-step supply chain attack:
First, on March 19, they compromised Trivy, a security scanning tool, by exploiting a misconfigured GitHub Actions workflow. Then on March 24, using credentials stolen from that first breach, they injected malicious code into LiteLLM's PyPI packages (the standard way Python developers install software libraries). The malware ran automatically on installation, stealing credentials and deploying privileged containers inside Mercor's infrastructure.
The extortion group Lapsus$ later claimed credit for the attack, publishing samples of the stolen data.
Attackers also obtained Tailscale VPN data - a complete map of Mercor's internal network plus device certificates that would let them impersonate trusted internal machines.
The Bigger Problem for AI Labs
Mercor spokesperson Heidi Hagberg said the company "moved promptly" to contain the incident, emphasizing that "privacy and security of our customers and contractors is foundational." Mercor described itself as "one of thousands of companies" affected by the LiteLLM supply chain attack.
That framing downplays what makes this breach different from a typical software supply chain incident. Mercor isn't a random SaaS company. It recruits domain experts in medicine, law, and other fields specifically to generate training data for frontier AI models. The exposure of its internal systems and client relationships gives attackers a map of how the biggest AI labs source and process their training data.
The hardcoded API keys in the stolen source code are an immediate operational concern. But the longer-term worry is what the stolen data reveals about the AI training pipeline itself - who's doing the work, what they're being asked to label, and how those systems connect to the models we all use.
For the thousands of contractors whose biometric data, interview recordings, and personal documents are now in the hands of an extortion group, the consequences are more immediate and personal. A class action investigation is already underway.