Related ToolsClaudeClaude CodeCursorAider

Opus 4.8 Burned 12 Hours With Zero Output. Sonnet 4.6 Finished the Same Job in One Session.

Editorial illustration for: Opus 4.8 Burned 12 Hours With Zero Output. Sonnet 4.6 Finished the Same Job in One Session.

One model produced zero deliverables after 12 hours. The other finished the same type of project in a single session. Both are from Anthropic, both are named Claude, and the gap between them raises a genuinely useful question about how to choose the right tool for production work.

A developer with over a year of experience using Claude to build bots, parsers, and format engines recently hit a wall with Opus 4.8 before switching back to Sonnet 4.6 and actually shipping. The account is worth examining because it inverts the conventional advice: more capable model, better results.

What Broke With Opus 4.8

The developer had worked out a specific approach for using Opus that delivered results: four hours of architecture planning, thirty minutes of actual coding. The reasoning was sound - Opus tended to want to think before building, and forcing that thinking phase upfront aligned with how the model naturally operated.

Opus 4.8 extended that tendency past the point of usefulness. Twelve hours of interaction produced no working code. The model's deliberation loop - analyzing requirements, considering approaches, surfacing edge cases - ran without ever committing to an implementation.

In software development terms, this is analysis paralysis. The model understood the problem thoroughly enough to see every complication, which became an obstacle rather than an asset when the goal was shipping something.

Why Sonnet 4.6's Supposed Weakness Is Actually an Advantage

The developer's original assessment of Sonnet 4.6 was unflattering: a model that "starts coding before it understands the task." Fast, confident, and often wrong about the full scope of what is being built.

After the Opus 4.8 deadlock, switching to Sonnet 4.6 produced results in one session.

What reads as a flaw in a research or planning context turns into an asset in production. A model that commits to an implementation, ships working code, and lets you iterate is more useful for building real software than one that identifies every possible complication before touching a file. Wrong code is fixable. A model locked in deliberation is not.

This pattern shows up consistently among developers who have been using AI-assisted coding tools seriously for more than a few months. The relationship between model capability and model productivity is not linear. Higher reasoning capacity can mean more time spent on edge cases and caveats and less time spent producing output that moves the work forward.

A More Useful Frame for Model Selection

The right question before starting a session is not "which model is most capable?" It is "what does this task actually require?"

For execution work - writing code against a defined spec, producing structured output, implementing a feature where the requirements are clear - Sonnet 4.6 gets the job done faster. It executes against a brief without requiring extensive prompting to overcome deliberation.

For genuinely open-ended problems - architecture decisions, design tradeoffs, situations where you want a model to surface complications before you commit to a direction - Opus's tendency to explore before deciding is the right tool.

The developer's original Opus workflow (plan heavily, code briefly) still makes sense for design work. The mistake was using Opus as the implementation model, not just the planning model. Sonnet 4.6 builds. Opus thinks. Knowing which you need on any given day is the whole game.