Research Notable

AI Coding Agents Can Debug Complex Systems but Forget Everything Between Sessions

April 3, 2026 3 min read

What if the bottleneck with AI coding agents isn't intelligence, but memory?

Software engineer Sean Goedecke makes a compelling argument rooted in a 1985 paper by computer scientist Peter Naur: the real output of programming isn't code. It's the mental model - the "theory" - that a developer builds about how a system works. You can't meaningfully change a program just by having access to the source code. You first need to read through it carefully enough to understand what it's supposed to do, why it's structured this way, and what breaks if you change something.

The question is whether AI agents are destroying that theory-building process, and Goedecke's answer is more nuanced than the usual "AI makes developers lazy" takes.

Developers Still Build Theories - Just Differently

Goedecke's own workflow data suggests that only about 10% of what AI agents produce actually makes it into his final output. The other 90% is evaluation work - checking whether the agent's suggestions match his existing understanding of the codebase. That evaluation process is theory-building. You can't assess whether a code change is correct without understanding the system it's changing.

The concern that AI tools let developers skip the hard work of understanding code is real but overstated. Every developer already works with incomplete mental models. Nobody fully understands every abstraction layer in their stack, from application code down through the OS kernel. AI agents just shift where the gaps are.

Agents Build Theories Too - Then Immediately Lose Them

Here's the genuinely interesting observation: looking at agent logs, Goedecke noticed that AI coding agents do something that looks a lot like theory-building. They form hypotheses about how code works, test those hypotheses by examining files, adjust their understanding when they're wrong, and use that updated model to make changes. They can debug complex, multi-file issues by constructing a working mental model of the system.

The critical limitation is that agents can't retain these theories. Every new session starts from zero. The agent that spent 20 minutes building a sophisticated understanding of your authentication system has completely forgotten it by the next conversation. It will re-read the same files, re-discover the same patterns, and re-build the same mental model from scratch.

The Real Next Step Isn't Smarter Models

This reframes what progress in AI coding tools actually looks like. The most impactful improvement wouldn't be better reasoning or larger context windows (the amount of text a model can process at once). It would be persistent memory - giving agents the ability to carry forward their understanding of a codebase between sessions, whether through fine-tuning (training the model on your specific code), extended context, or some other mechanism.

Some tools are already experimenting with this. Claude Code has a memory system and CLAUDE.md project files. Cursor has its .cursorrules. But these are crude approximations - hand-written notes rather than the deep, earned understanding that comes from actually working through problems in a codebase.

The developers getting the most out of AI agents right now are the ones who already have strong mental models of their codebases and use agents to accelerate execution. The developers struggling are the ones hoping agents will build that understanding for them. Until persistent memory is solved, that gap won't close.

Developers Still Build Theories - Just Differently

Agents Build Theories Too - Then Immediately Lose Them

The Real Next Step Isn't Smarter Models

Related Tools

More from today

Claude AI Discovers Remote Code Execution Bugs in Vim and Emacs

Adafruit Tests Show LLMs Reproduce Open-Source Code Verbatim at High Rates

ETH Zurich Study: LLMs Can Identify Anonymous Users for $4 a Person

Cookie Preferences