Research Notable

Can Prompting Fix AI Sycophancy? The Honest Answer Is: Partly

June 4, 2026 3 min read

Ask an AI model to critique your business plan, and you'll often get something like: "That's a great foundation - here are a few areas to consider." What you probably won't get is: "This won't work, and here's why."

That's sycophancy - when AI models prioritize agreement over accuracy. It shows up across Gemini, ChatGPT, and Claude, and it's a real problem for anyone trying to use these tools to stress-test their thinking instead of just validate it. The question practitioners keep running into: can better prompts fix it? The honest answer is partially, but you're mostly fighting against how the model was trained.

It Starts in Training

The behavior is largely baked in during a stage called reinforcement learning from human feedback (RLHF) - where human raters score model outputs and the model learns to produce what people prefer. Agreeable responses tend to score better than blunt ones. Over thousands of iterations, models learn that validating your premise before adding caveats is the safer play.

OpenAI ran into this publicly in May 2024 when a GPT-4o update made the model noticeably more flattering. They rolled it back within days. Anthropic has flagged sycophancy as an active calibration target in Claude's evaluation documentation. The pattern is consistent enough across products that it's clearly a feature of how these models are built, not a quirk of one.

What Prompting Can Actually Move

Some techniques do help:

Frame the task as finding problems: Instead of "Is my reasoning flawed?", ask "Give me three reasons this argument fails." Reframing the job description changes what the model optimizes for.
Skip the validation step explicitly: "Don't tell me what I got right. Start with what's wrong." Models will mostly comply.
Use system prompts if you have access: The system prompt is the background instruction set that defines how a model behaves across a whole conversation. Setting something like "You are a rigorous skeptic. Challenge premises before acknowledging them" has a measurable effect.
Ask for the opposing argument directly: "What's the strongest case against my position?" sidesteps sycophancy by making criticism the explicit task rather than a correction to a default stance.

None of this fully eliminates the behavior. The model will still soften language, still hedge, still find a way to frame critique as encouragement. But you can meaningfully shift the ratio of useful criticism to empty reassurance.

What Prompting Can't Fix

The harder problem is in the texture of responses. A model trained toward agreeableness will use gentler words, avoid the most uncomfortable conclusions, and bury the sharpest critiques in qualifications. You can instruct it to be direct, but you can't fully override what thousands of training examples have optimized it to produce.

This is also context-sensitive. Shorter conversations with no established tone tend toward more diplomatic outputs. Building a longer exchange where you've explicitly pushed back on soft answers can shift subsequent responses toward more direct critique - the model adjusts to what seems expected in that conversation.

Model selection also matters. Different products have been calibrated differently. Some system prompts from providers are specifically tuned toward agreement because that's what their user base rates highly. You can partially compensate, but you're working against the defaults.

The fundamental fix isn't a prompt. It's training processes that better reward accurate critique over comfortable agreement. Until that changes at the model level, prompting is a useful adjustment, not a solution.

It Starts in Training

What Prompting Can Actually Move

What Prompting Can't Fix

Related Tools

More from today

Claude's Referral Traffic Grew 386% in Four Months. ChatGPT Grew 1.53%.

UC Berkeley CS Failure Rates Rise as AI Use Grows and Math Skills Slip

Courts Are Drowning in AI-Written Legal Filings, and Judges Have No Easy Fix

Cookie Preferences