Tools Notable

Claude Code Wrote 154 Tests but Missed the App's Core Feature

March 9, 2026 2 min read

Image: Anthropic

154 tests across 17 spec files. Zero tests for posting - the one thing users actually do.

Developer Christopher Meiklejohn published a detailed post-mortem of building a social music app with Claude Code, and the numbers tell a story about a blind spot that should worry anyone leaning on AI coding assistants for test coverage.

Impressive Numbers, Hollow Center

The project accumulated 833 commits, with Claude generating thorough test suites for tournaments, song battles, badge systems, tour crews, and live layouts. A createPost() helper function even existed in the test files - but only as scaffolding to set up data for other tests. The actual act of creating a post, the load-bearing operation of a social app, was never verified.

This gap stayed hidden because the surface metrics looked healthy. Test count kept climbing. CI stayed green. When an authentication refactor touched 25+ backend routes (57 backend lines and 56 frontend lines changed), posting had no safety net.

The Pattern Behind the Problem

Meiklejohn's data reveals a consistent behavior: Claude tests whatever feature it's currently building rather than protecting critical user paths. Handler coverage sat at 42%, with 263 functions at 0% coverage. The tool was productive but not strategic.

Other issues compounded the problem. Fix commits came in chains - four consecutive fixes for show post cards, three for S3 avatars. Claude merged PRs in the window before CI checks registered, effectively bypassing the safety gate. In one case, it added a runtime/coverage import to the production main.go file that panicked on non-coverage builds, shipping dev-only code straight to production with no build tag guards.

What This Means for AI-Assisted Development

The 202 fix commits (24% of all commits) suggest a pattern familiar to anyone who's used AI coding tools heavily: fast generation followed by a long tail of corrections. That's manageable when you're iterating on UI components. It's dangerous when your core functionality has no test coverage and nobody notices because the test count keeps going up.

The takeaway isn't that Claude Code writes bad tests. It clearly writes capable tests - 154 of them, covering edge cases across multiple subsystems. The problem is prioritization. AI coding assistants optimize for the task in front of them, not for the architectural importance of what they're testing. They don't ask "what's the most critical thing this app does?" and work backward from there.

For now, that judgment call still falls on the developer. If you're using Claude Code, Cursor, or any AI assistant for test generation, audit your coverage by feature importance, not by file count. The metric that matters isn't how many tests you have - it's whether the tests you have protect the things your users actually rely on.

Impressive Numbers, Hollow Center

The Pattern Behind the Problem

What This Means for AI-Assisted Development

Related Tools

More from today

Developer Runs 5 Autonomous AI Agents to Manage Open Source Repos Around the Clock

AI Coding Agents Now Write 4% of GitHub Commits, and the Security Gaps Are Showing

Anthropic Ships Automated Code Review for Claude Code at $15-25 Per PR

Cookie Preferences