Claude Opus 4.7 is drawing criticism from daily users who say the model refuses ordinary tasks at a noticeably higher rate than its predecessor. The complaints aren't about attempting sensitive or unusual requests - people are hitting resistance on straightforward professional work, then spending multiple follow-up messages convincing the model to proceed.
The token cost is the specific pain point. Claude charges by the token, meaning every refusal, every clarification exchange, and every re-framing attempt burns money before a user gets to the actual output. If a model declines and then complies after two or three follow-up messages, users are paying for the friction on top of the actual task.
Claude has always skewed more cautious than ChatGPT on content that touches anything ambiguous - that's been a consistent trade-off between the two models for years. But user reports suggest 4.7 has moved that threshold in ways that catch routine use cases. The practical result is a model that can score well on reasoning benchmarks while still being a daily headache if your work involves any topic it deems borderline.
Anthropologic hasn't publicly addressed whether the increased caution in 4.7 is intentional safety tuning or a miscalibration. That distinction matters: intentional means it won't change, miscalibration means a patch is plausible. For anyone running 4.7 in production workflows, the safest move right now is to test it against your actual tasks before assuming benchmark performance translates to your specific use case.