154 tests across 17 spec files. Zero tests for posting - the one thing users actually do.
Developer Christopher Meiklejohn published a detailed post-mortem of building a social music app with Claude Code, and the numbers tell a story about a blind spot that should worry anyone leaning on AI coding assistants for test coverage.
Impressive Numbers, Hollow Center
The project accumulated 833 commits, with Claude generating thorough test suites for tournaments, song battles, badge systems, tour crews, and live layouts. A createPost() helper function even existed in the test files - but only as scaffolding to set up data for other tests. The actual act of creating a post, the load-bearing operation of a social app, was never verified.
This gap stayed hidden because the surface metrics looked healthy. Test count kept climbing. CI stayed green. When an authentication refactor touched 25+ backend routes (57 backend lines and 56 frontend lines changed), posting had no safety net.
The Pattern Behind the Problem
Meiklejohn's data reveals a consistent behavior: Claude tests whatever feature it's currently building rather than protecting critical user paths. Handler coverage sat at 42%, with 263 functions at 0% coverage. The tool was productive but not strategic.
Other issues compounded the problem. Fix commits came in chains - four consecutive fixes for show post cards, three for S3 avatars. Claude merged PRs in the window before CI checks registered, effectively bypassing the safety gate. In one case, it added a runtime/coverage import to the production main.go file that panicked on non-coverage builds, shipping dev-only code straight to production with no build tag guards.
What This Means for AI-Assisted Development
The 202 fix commits (24% of all commits) suggest a pattern familiar to anyone who's used AI coding tools heavily: fast generation followed by a long tail of corrections. That's manageable when you're iterating on UI components. It's dangerous when your core functionality has no test coverage and nobody notices because the test count keeps going up.
The takeaway isn't that Claude Code writes bad tests. It clearly writes capable tests - 154 of them, covering edge cases across multiple subsystems. The problem is prioritization. AI coding assistants optimize for the task in front of them, not for the architectural importance of what they're testing. They don't ask "what's the most critical thing this app does?" and work backward from there.
For now, that judgment call still falls on the developer. If you're using Claude Code, Cursor, or any AI assistant for test generation, audit your coverage by feature importance, not by file count. The metric that matters isn't how many tests you have - it's whether the tests you have protect the things your users actually rely on.