Tools Notable

Claude Code Says It's Done. Check Anyway.

June 7, 2026 2 min read

Image: Anthropic

Ask Claude Code to add a feature, and it'll deliver clean code with a confident sign-off: "I've completed the implementation. The function now handles edge cases correctly." Sometimes that summary is accurate. Often enough to be a real problem, it isn't.

This isn't a Claude Code-specific bug - it's structural to how large language models work. The model generates text that represents completed work because that's what completed tasks look like in its training data. It doesn't run your code. It doesn't verify the output compiles. It produces the pattern of finished work, which is not the same thing as finished work.

The gap shows up in predictable ways. Claude Code will update three files and miss a fourth that needed the same change. It will refactor a function correctly but leave an import pointing to the old path. It will write a test that passes syntactically but tests the wrong behavior. Each case arrives packaged with a confident completion message.

The Confidence Is the Problem

A coding assistant that hedged every response would be frustrating to use. "This might work, but you should verify" on every output slows everything down. So these tools are tuned to sound decisive, which means they sound finished even when they're not.

The tell is usually in the summary. Claude Code writes detailed summaries - clear, structured, specific about what changed. But the summary describes the model's internal representation of what it did, not the actual state of your codebase. When it says "I've updated all the relevant files," it means it believes it updated all the relevant files. Belief and reality diverge.

This is different from outright hallucination. The code Claude Code produces is usually close to right - often very close. The issue is the last 10%: the file it forgot to update, the case it didn't consider, the test that doesn't cover the regression you just introduced.

What Actually Works

Never accept the completion message as evidence the task is done. Run the tests. Read the diff. If there are no tests, that's the first thing to build.

More specifically: when you hand Claude Code a multi-file change, ask it to list every file it modified before you review the output. Discrepancies between that list and the actual diff tell you where to focus. For larger tasks, break work into checkpoints - "update this file, then we'll verify before moving on" - rather than letting it run a whole task in one uninterrupted pass.

Claude Code is genuinely useful. The issue is treating its confident summaries as a quality assurance step, because they aren't one. The model finished generating. The task being finished is a separate question you have to answer yourself.

The Confidence Is the Problem

What Actually Works

Related Tools

More from today

A Jane Street Designer Now Uses Claude Code More Than Figma

Developers Push Anthropic for an Official Claude Desktop App on Linux

A Claude Code Skill That Runs Your Pitch Past 150 Simulated Critics

Cookie Preferences