Opus 4.8 has been in users' hands for days now, and the early pattern matches Anthropic's benchmark claims: the model is meaningfully better on tasks that punish mistakes, and the fundamental tradeoffs - speed, cost - haven't changed.
The clearest gains show up on work that requires careful, multi-step reasoning: production-quality code generation, debugging sessions, analyzing dense technical or legal documents, and following complex multi-part instructions without losing context. Extended thinking mode, where the model reasons through a problem internally before answering, is delivering noticeably stronger results than on earlier Opus versions.
The gap between Opus 4.8 and Sonnet 4.6 - Anthropic's faster, cheaper middle-tier model - is real but task-dependent. For most writing, summarization, and routine coding work, Sonnet 4.6 still offers better value per dollar. Opus 4.8 pulls ahead when accuracy matters more than throughput.
The main complaints are predictable: slower responses at high effort settings, and API costs that accumulate fast on volume-heavy workflows. Neither is surprising. That has been the Opus tradeoff since the beginning, and Opus 4.8 does not change the economics - it just raises the performance ceiling for users who need it.