Opus 4.8 is Anthropic's most capable model, and using it regularly makes the difference from previous versions clear fairly quickly.
The model is more direct than its predecessors. It takes positions. Ask it to evaluate an argument and it tells you what it actually thinks is wrong rather than presenting "perspectives on both sides." When the reasoning behind your question is flawed, it says so instead of quietly accommodating the premise.
That directness pays off for specific types of work: code review, research critique, business plan stress-testing - anywhere you need a model to push back rather than validate your thinking. Opus 4.8 performs that role more credibly than Claude 3 Opus did. It feels less like an eager assistant and more like a skeptical colleague.
The trade-off is response length. Opus 4.8 defaults to thorough answers even for simple questions, which adds latency and API cost when you don't need that depth. Prompting explicitly for brevity helps on quick-turnaround tasks, or route those to Claude Sonnet instead.
The newly added highest-effort setting - which now goes above the previous "Max" - amplifies this further. Turn it up and you get longer, more thorough responses with visible reasoning chains even when you haven't specifically asked for them. On hard analytical problems, that's a feature. On everything else, start with the defaults.
Opus 4.8 isn't the right tool for every task. Sonnet handles most everyday work faster and cheaper. But for the problems where you genuinely need the model to think - where a confident-but-shallow answer would cause real damage - Opus 4.8 is a qualitatively different experience from what Claude 3 offered.