Related ToolsClaude CodeCursorCodyContinueAider

The Code Review Problem Nobody Has Solved for AI-Generated Code

AI news: The Code Review Problem Nobody Has Solved for AI-Generated Code

What happens to code review when half your pull requests are written by an AI agent?

This is the question hanging over every engineering team that has adopted AI coding tools in the past year. Tools like Cursor, Claude Code, and GitHub Copilot are generating more code than ever, but the processes for vetting that code have barely changed since humans were the only ones writing it.

The core tension is simple: traditional code review assumes a human author who understands their changes, can explain their reasoning, and takes responsibility for bugs. AI-generated code breaks all three assumptions. The agent does not remember why it chose one approach over another. It cannot be paged at 2 AM when its code causes an outage. And the human who prompted it may not fully understand every line it produced.

Where Current Workflows Break Down

Most teams are handling AI-generated code the same way they handle human code: open a PR, get a reviewer to approve it, merge. But reviewers are already reporting fatigue. AI agents tend to produce larger diffs, and the code often looks plausible but contains subtle issues that are easy to miss during review. A function that works correctly 95% of the time but fails on edge cases is harder to catch than an obvious syntax error.

Some teams have started requiring that AI-generated PRs include test coverage above a certain threshold before review. Others mandate that the human operator must be able to explain every change line-by-line. Neither approach scales well. The first floods CI pipelines. The second defeats the speed advantage of using AI tools in the first place.

What a Post-AI Code Review Process Might Look Like

A few patterns are emerging. Automated verification layers that go beyond unit tests, including property-based testing and formal verification for critical paths, are getting more attention. Some organizations are experimenting with AI-assisted review, using a second model to audit the first model's output, though this raises obvious questions about correlated failures.

The most practical approach right now seems to be tighter scoping: use AI agents for well-defined, bounded tasks where the output is easy to verify, and keep humans on the complex architectural decisions. That is not a permanent solution, but it matches what the tools can reliably do today.

The industry needs better tooling for change control and auditability of AI-generated code. Until that exists, engineering teams are essentially running an experiment in production, hoping their existing review processes catch what the models get wrong.