Related ToolsCursorClaude CodeAiderCodyContinue

AI Coding Agents Drift From Your Specs the Longer They Run

AI news: AI Coding Agents Drift From Your Specs the Longer They Run

Ask an AI coding agent to build a backend service with specific constraints - "use PostgreSQL, authenticate with JWT tokens, follow this error-handling pattern" - and it will comply at first. Three hours and forty tool calls later? Those original rules may have quietly disappeared from the agent's behavior.

This phenomenon has a name now: constraint decay. A research paper on arxiv documents how AI agents fail specifically when generating backend code over long, multi-step sessions.

What the Research Found

The core problem is the context window - the limit on how much text an AI model can process at once, measured in tokens (roughly 4 characters each; 100,000 tokens is about the length of a 300-page book). In a short conversation, your original constraints sit near the top of that window and the model follows them closely. In a long agentic workflow - where an AI reads files, writes code, runs tests, and cycles through dozens of steps - those original constraints get buried under layers of intermediate output. The model's adherence to them weakens.

Constraint decay isn't the model forgetting your instructions entirely. The text is technically still in the context. But models assign less behavioral weight to instructions that are far from the current generation point - the closer you get to the end of a long task, the less pull your original rules have compared to the immediately preceding context.

For backend code, this creates compounding problems. Code written in step 2 of a session gets referenced and built upon in step 20. If the agent has drifted from your original data model or security requirements by then, later modules will be subtly incompatible with earlier ones - the kind of bugs that don't surface until integration or runtime testing.

What This Means for Coding Tool Users

Cursor, Claudee Code](/tools/claude-code/), and Aider run exactly the kind of multi-step agentic loops this research examines. If you're using them for anything beyond small, isolated tasks - building an entire API, refactoring a large codebase, generating a backend service from scratch - this finding applies directly.

Some practical takeaways:

  • Shorter sessions are more reliable. Breaking large coding tasks into smaller, explicitly constrained sessions reduces constraint decay. Each fresh session puts your rules at the top of the context where they carry more weight.
  • Repeat your constraints mid-task. Many experienced users already restate key requirements partway through a long session intuitively. The research gives that habit a concrete mechanical basis.
  • Review at the module level, not just line by line. The longer an agentic session runs, the less you can assume the final output still matches the original spec. Module-level coherence is where constraint decay shows up most.
  • Prompting that re-injects constraints at multiple points throughout a workflow is better positioned to maintain adherence than a single front-loaded system prompt.

The research doesn't offer a clean fix - it documents and measures the problem. But the implication is clear: the reliability gap with AI coding agents isn't primarily about whether individual code snippets are correct. It's about whether the code stays coherent across a long task. Constraint decay is likely a major contributor to the "AI code works in isolation but breaks in integration" complaint that experienced developers have been raising about these tools.