Every developer who's spent an hour deep in a Claude Code or Cursor session has felt it: the agent starts strong, then gradually produces sloppier code, makes wrong assumptions, and ignores instructions you gave ten minutes ago.
David Reis has a useful mental model for why this happens. LLMs operate inside a "capacity box" defined by two dimensions: context length (how much text they can hold) and logical complexity (how many reasoning steps they can chain). Push past either limit and the model doesn't throw an error. It just silently degrades, producing incomplete code and making dangerous assumptions with zero warning.
This is the core problem. There's no red light on your dashboard. The agent just gets confidently worse.
Sub-Agents Over Monolithic Conversations
The most practical fix is delegating tasks to sub-agents that run in isolated contexts and return concise results to your main conversation. Both Claude Code and Cursor support this now. Instead of one ballooning conversation that reviews code, writes tests, and refactors all at once, you break work into focused chunks. Each sub-agent gets a clean context window (the amount of text the model can process at once) and a narrow task.
Roll Back, Don't Follow Up
When an agent produces bad output, the instinct is to say "no, fix this" and pile on corrections. That's counterproductive because each follow-up adds to context bloat, pushing you further past the capacity box. Better approach: roll back the changes entirely and refine your original prompt. Fewer tokens, cleaner results.
Living Documentation Beats Re-Reading Code
Maintaining a docs/ folder with markdown files covering your project architecture, module responsibilities, and current tasks gives the agent a compressed map of your codebase. Loading a 200-line architecture doc consumes far fewer tokens than having the agent scan 50 source files to figure out the same information.
The article also flags a subtler issue: tool bloat. Every MCP server, every tool definition, every instruction file you load eats into that context budget. Being selective about what you attach to a session matters more than most developers realize.
None of this is new wisdom to experienced users, but the "capacity box" framing is a clean way to explain why these practices work. The model isn't getting lazy or broken. You're just quietly exceeding the box.