Something has shifted in how Claude phrases its responses, and frequent users are noticing. The phrase "honest caveat" has become a common sentence opener in Claude's replies, appearing in lines like "Honest caveat: I can't verify this in real time" or "Honest caveat: this approach has tradeoffs." Users who work with the model daily are flagging it as a new behavioral tell.
The complaints aren't about the substance of the caveats. Flagging limitations is often genuinely useful. The problem is the phrasing itself: "honest caveat" implies that other caveats in the same response might be dishonest. It comes across as performative self-awareness, a verbal signal that the model is about to be transparent rather than just being transparent. Compare it to someone who starts every email with "To be completely candid with you." Technically fine once, but repeated enough times, it erodes rather than builds trust.
This is a known failure mode in large language models (AI systems trained on enormous text datasets to generate human-like language). Training processes can accidentally reinforce unusual phrases, especially ones that appear alongside signals the model learns to imitate. Once a pattern gets encoded, it tends to spread to contexts where it doesn't quite fit.
Anthropic hasn't commented on the specific phrase, and it's not clear whether this reflects a deliberate change in Claude's system prompt (the baseline instructions baked into every conversation) or a side effect of a recent model update. What's clear is that the pattern points to a real issue: AI model behavior can shift in small, frustrating ways between versions with no public announcement, and users who work with these tools daily are often the first to notice.