67%. That's the reduction in measured thinking steps one developer tracked in Claude Code between January and March 2026, documented through session logs showing the model skipping file-read confirmations before edits, bypassing analysis it used to perform by default, and triggering stop hook violations - errors that fire when Claude Code tries to exit a task loop before completing required steps - at rates far above any prior baseline.
The degradation was the kind that's easy to dismiss. Claude Code wasn't crashing. It still produced code. But developers who'd built workflows around the tool's behavior noticed it finishing tasks faster in ways that didn't track with the complexity of the work - completing file edits without first confirming it had read the files, reaching conclusions without the intermediate reasoning steps that had previously been visible in the output.
What Changed in February
Two product decisions happened around the same time. Anthropic made Claude's internal reasoning process - the "thinking" steps where the model works through a problem before responding - invisible in the UI by default. That change was cosmetic, but it removed a key signal developers used to verify the model was actually working through their request.
Separately, the default reasoning effort level was quietly lowered. Claude Code has an "adaptive thinking" feature designed to apply more processing power to complex tasks and less to routine ones. That calibration appears to have gone wrong for coding work, treating file edits and multi-step programming tasks as low-complexity interactions that didn't warrant full reasoning.
The combination created a model that looked identical on the surface but was doing less work per task. And because the thinking steps were hidden by default, users had no direct way to verify whether full reasoning was happening.
The Measurement Problem
Most users don't track session behavior across months. The developers who caught this earliest were the ones with enough historical context to run a direct comparison. For everyone else, "it feels a bit off" isn't enough to escalate as a concrete complaint - especially when the standard response to any quality complaint is "check your settings."
The 67% figure comes from user-level transcript analysis, not a controlled benchmark. But the directional finding is consistent across multiple independent reports: more errors on file editing tasks, more stop hook violations, and less explicit reasoning shown before the model took action. The behavioral shift was real, even if precise measurement required manual investigation.
For developers building on Claude Code as a core workflow tool, the practical effect was an increase in errors on tasks the tool had previously handled reliably - which meant more time auditing outputs that used to be trustworthy, and workflow changes to compensate for a tool that had shifted without warning.