Research Notable

Four AI Labs Independently Built the Same Agent Architecture

March 13, 2026 3 min read

Cursor, Anthropic, OpenAI, and Google DeepMind built their AI coding systems independently. They ended up with nearly identical architectures. That convergence tells us more about where software engineering is headed than any single product announcement.

A detailed analysis circulating among engineering teams breaks down how all four organizations landed on the same core pattern: break problems into pieces, run agents in parallel, verify outputs with a separate judge, and loop until the work passes inspection. No one copied anyone. They all just hit the same walls and found the same solutions.

The Pattern: Planner, Worker, Judge

Cursor tried flat coordination first, letting multiple agents work as equals. It failed. They tried optimistic concurrency control, where agents work simultaneously and reconcile conflicts later. That also failed. What worked was hierarchy: a Planner agent explores the codebase and creates tasks, a Worker executes individual tasks, and a Judge decides whether to accept the result or trigger a restart.

The results from this architecture are hard to ignore. Cursor's system completed a web browser (over 1 million lines of code), an Excel clone (1.6 million lines), and a Windows 7 emulator (1.2 million lines). It also solved a Stanford/MIT/Berkeley research-grade math challenge in four days with zero human guidance. Notably, Cursor found that GPT-5.2 outperformed Claude Opus 4.5 on long-horizon tasks requiring sustained planning across many steps.

Anthropic's Claude Code uses a similar pipeline - an initializer sets context, agents do the work, and structured artifacts make sessions resumable. OpenAI's Codex runs tasks in isolated Git worktrees (separate copies of a codebase), with a central server managing threads to prevent what they call "context rot" - when an agent's working memory gets polluted by too much intermediate junk. Google DeepMind's Aletheia system adds a twist: its verifier works in natural language, identifying flaws in plain English before sending work back for revision. That system hit 90% accuracy on IMO-ProofBench Advanced and autonomously solved four previously open mathematical conjectures.

What Engineers Can Actually Delegate

Anthropic's internal data puts some useful numbers on the delegation question. Engineers can "fully delegate" only 0-20% of their work to AI agents. The tasks that get delegated successfully share common traits: they're easily verifiable, well-defined, repetitive, or outside the engineer's core expertise. Meanwhile, Claude Code's feature implementation usage jumped from 14% to 37% of all tasks as the tool matured - suggesting engineers are finding more work that fits the delegation criteria, even if full autonomy remains limited.

The analysis makes a point that cuts against the "AI will replace developers" narrative: using AI aggressively can actually erode the judgment skills needed to supervise it effectively. Engineers who let AI handle everything stop building the domain expertise required to tell good output from bad. The recommendation is to develop what the author calls "strategic deep-diving" - the ability to move fluidly between high-level architecture and line-by-line analysis, rather than either obsessing over every line (old school) or blindly accepting AI output (vibe coding).

Systems Thinking Over Prompt Engineering

The most practical takeaway is a shift in where engineers should invest their time. The bottleneck is no longer writing the perfect prompt. It's designing systems where agents can work independently on well-scoped pieces with clear completion criteria and verification checkpoints.

This maps directly to what all four organizations discovered independently. Flat coordination fails. Hierarchy works. And the most critical component in every working multi-agent system is not the generator - it's the judge. Determining whether code is correct remains fundamentally human work, and building that evaluation skill requires deliberate practice.

For anyone using AI coding tools today, the implication is concrete: spend less time crafting elaborate prompts and more time structuring your project so an agent can pick up a task, complete it in isolation, and produce something you can verify quickly. That's the architecture the biggest AI labs converged on, and it works just as well at the individual developer level.

The Pattern: Planner, Worker, Judge

What Engineers Can Actually Delegate

Systems Thinking Over Prompt Engineering

Related Tools

More from today

Critical 9.8 Vulnerability in ModelScope Lets Attackers Hijack AI Agents

GlassWorm Attack Hides Malicious Code in 151+ GitHub Repos Using Invisible Unicode

Engineer Uses ChatGPT to Help Design a Cancer Vaccine for His Dog

Cookie Preferences