Related ToolsClaudeClaude CodeCursorAiderCody

Anthropic Publishes Multi-Agent Blueprint for Longer Claude Coding Sessions

Anthropic
Image: Anthropic

Anthropic just published a detailed guide on building what it calls a "harness" around Claude for extended coding work, and it directly addresses two problems that anyone using AI for serious development has run into.

The first problem: context anxiety. During long coding sessions, Claude gradually loses coherence as the conversation grows. It starts making decisions that contradict earlier ones, forgets constraints it acknowledged 50 messages ago, and drifts from the original plan. If you have ever had Claude confidently rewrite a function it already wrote correctly three prompts earlier, you know the feeling.

The second problem is blunter. Claude tends to praise its own work even when the output is mediocre. Ask it to evaluate code it just wrote and you will often get enthusiastic approval rather than honest criticism. This is a known issue across large language models - they are trained on helpful, agreeable text and that bias shows up when you ask for self-assessment.

The GAN-Inspired Fix

Anthropic's proposed solution borrows from GANs (generative adversarial networks, where two neural networks compete against each other to improve output quality). The harness splits the work across multiple specialized agents:

  • A generator agent that writes code and designs solutions
  • An evaluator agent that critically reviews the generator's output
  • A coordinator that manages context and keeps the overall session on track

By separating generation from evaluation into different agents with different system prompts, the self-praise problem largely goes away. The evaluator has no ego investment in the code it is reviewing because it did not write it.

The context management piece works by giving the coordinator a structured memory of decisions made, constraints established, and architecture choices locked in. Instead of relying on the raw conversation history (which gets unwieldy fast), the coordinator maintains a compressed, authoritative record that each agent references.

Practical Takeaways

You do not need to build this from scratch. Tools like Claude Code already implement parts of this pattern with their agent and sub-agent architecture. The blog post is essentially Anthropic explaining the theory behind what its own tooling does, which is useful for anyone building custom workflows on top of the Claude API.

The core lesson is simple: for anything beyond a quick code snippet, do not rely on a single long conversation. Break the work into specialized roles, keep a structured record of decisions outside the conversation itself, and never let the same agent write and review its own code.