30 agents. 4 with usable public documentation. That's the finding from MIT researchers who reviewed AI agents currently deployed by major AI labs - and it's the clearest picture yet of how fast agent deployment is outrunning accountability.
The study catalogued 30 AI agents - software systems that can take sequences of actions autonomously, like searching the web, executing code, managing files, or sending messages on your behalf - and looked for three basic things in each agent's public documentation: what the agent is designed to do, where its capabilities end, and what happens when it fails.
Only 4 of 30 checked all three boxes.
The Failure Mode Problem
The missing piece that matters most to anyone deploying these systems is failure behavior. When an AI agent runs into a task it can't handle, what does it do? Does it flag the issue? Fail silently? Take an action that sort of resembles the right answer but isn't?
None of that is documented for 26 of the 30 agents in the study.
This isn't abstract. Businesses are putting agents into real workflows - automated customer support queues, code review pipelines, research tasks, scheduling. When something breaks, teams need to know whether to debug the agent, reconfigure it, or acknowledge it was never designed for that case. Without public documentation, every failure is archaeology.
Capability ceilings matter too. An agent that silently drops a task because it's out of scope looks exactly like an agent that's malfunctioning. You can't tell the difference without a documented capability boundary to compare against.
The Gap Between 4 and 26
The 4 agents with adequate documentation prove this is achievable. The question is why the other 26 don't have it.
The most honest answer is competitive pressure. Labs are racing to ship agents before each other, and documentation is the last thing written and the first thing cut when a deadline moves. That's a normal product development pattern - it's just more consequential when the product is an autonomous system making decisions inside someone's business.
For anyone deploying AI agents from major labs right now, the MIT findings are a practical signal: treat every agent as undocumented until you've tested it yourself. Map its edges. Document what it does when pushed outside its designed use cases. You're going to need that information when something goes wrong - and no one else is building it for you.