Research Notable

Cognitive Science Explains Why Your AI Prompts Stop at "Good Enough"

March 19, 2026 3 min read

"Good enough" is the enemy of great when it comes to AI output. That is the central finding from researcher Graham, who applied cognitive science frameworks to LLM prompting and found that competent AI results actively prevent users from discovering superior alternatives - because they never think to look.

The core idea is simple but has real teeth: large language models trained on human text absorbed the statistical patterns of human cognition, including our cognitive biases. When your prompt mixes incompatible thinking modes - say, asking the model to both investigate a problem and evaluate options in the same breath - it gets stuck in one gear and stays there. Cognitive scientists call this "task-set inertia," and it turns out LLMs suffer from a version of it too.

The Three-Tier Experiment

Graham tested three approaches across six professional domains (legal contracts, marketing, HR, design research, engineering debugging, and security operations):

Tier 1 - Baseline: Standard prompting with no structural consideration
Tier 2 - Optimized single prompt: Removed numeric anchors, replaced example templates with analytical lenses, and separated different thinking modes within the prompt
Tier 3 - Multi-agent pipeline: Full decomposition into specialized sub-tasks handled by separate prompts

Tier 2 improved results across every domain tested. Tier 3 - the more complex pipeline approach - won decisively on investigation-heavy tasks but actually made things worse on recognition-based tasks, where an expert would pattern-match from experience rather than work through a checklist.

That last finding matters. The instinct to throw more structure at AI problems (more steps, more agents, more scaffolding) can backfire. Cognitive science has a name for this too: the "expertise reversal effect," where adding scaffolding designed for novices actively degrades expert-level performance.

What Actually Works

Three specific prompt fixes produced the most consistent improvements:

Remove numeric anchors. Giving the model a scale ("rate this 1-10") or example numbers biases it toward those figures. Cut them.
Replace seeded examples with analytical lenses. Instead of showing the model what a good answer looks like (which narrows its pattern-matching), tell it what dimensions to analyze along.
Establish scope boundaries between thinking modes. Do not ask the model to brainstorm and then critique in the same prompt. Separate the investigative phase from the evaluative phase.

The practical test for whether to go further and build a full pipeline: ask whether the correct analysis could exist without the specific input data. If the answer lives in the data you are feeding in, not in the model's training knowledge, decomposition helps. If the model should already know how to handle this type of problem, keep it simple.

The Side-by-Side Problem

The most uncomfortable part of this research is the implication for daily AI users: you cannot tell your output is mediocre unless you compare it against an optimized version. The "good" result reads fine. It answers the question. It looks professional. You move on. But placed next to output from a cognitively-informed prompt, the gaps become obvious.

This is not about prompt engineering tricks or magic phrases. It is about understanding that LLMs carry the cognitive fingerprints of their training data, and structuring your requests to work with those patterns rather than against them. The difference between a decent AI draft and a genuinely strong one often comes down to whether you asked the model to think in one mode at a time.

The Three-Tier Experiment

What Actually Works

The Side-by-Side Problem

Related Tools

More from today

OpenAI Reveals How It Catches Coding Agents Misbehaving in Production

ChatGPT's Crawler Accounts for 91% of AI Crawl Traffic on B2B Sites

Crypto.com Cuts 12% of Staff, Blaming AI for Eliminated Roles

Cookie Preferences