Research

When LLMs stop making things up: the conditions that actually reduce hallucinations

April 8, 2026 2 min read

It's a question practitioners have been asking since AI tools went mainstream: under what conditions do large language models - the AI systems underlying ChatGPT, Claude, and similar tools - actually stop fabricating information?

The answer emerging from both research and real-world use is specific enough to be useful. Models hallucinate (the technical term for confidently stating false information) far less often when they have direct access to source material during a response. The conditions that reduce hallucinations aren't mysterious - they're reproducible, and knowing them changes how you should design any AI workflow where accuracy matters.

Source Text Changes the Equation

The biggest factor in hallucination rates is whether the model is reading text you've placed in front of it, or retrieving facts from its training data - which is effectively its memory of billions of web pages absorbed during training.

When you give a model a specific document and ask questions whose answers are in that text, accuracy climbs significantly. The model is reading and reasoning, not recalling. This is the foundation of RAG (retrieval-augmented generation) - a technique where applications automatically pull relevant documents before sending a query to the model. Instead of asking "what does the GDPR say about data retention?", a RAG system first retrieves the relevant GDPR clauses, then asks the model to reason from those specific passages. The accuracy improvement is substantial.

The practical implication: don't ask models to recall facts from training data when you can give them the source. Paste in the contract. Upload the document. Include the data directly in your message. That extra step is worth it when the output matters.

Where Fabrication Still Happens

Source access isn't a complete solution. Hallucinations still occur when:

The question requires synthesizing information across many documents not all present in the conversation
The context window (the amount of text a model can read at once - GPT-4o's is 128,000 tokens, roughly 300 pages) isn't large enough to hold all relevant material
The task asks for precise citations with page numbers, which models routinely invent even when given source text
The question touches events after the model's training cutoff date

Hallucination rates also vary significantly by model. Smaller, cheaper models fabricate more often than larger ones. This matters when choosing which model to use for accuracy-sensitive tasks.

Building More Reliable Workflows

For practitioners, this consolidates into a workflow principle: treat AI as a reading and reasoning tool rather than a memory retrieval tool. When accuracy matters - in research, legal work, financial analysis, customer communications - provide the source material rather than asking the model to remember it.

When you can't provide sources, build in verification steps. The model's expressed confidence is not a reliable signal for accuracy. A model can state a fabricated statistic with exactly the same tone it uses to state a verified one. High certainty and complete fabrication are not mutually exclusive.

Source Text Changes the Equation

Where Fabrication Still Happens

Building More Reliable Workflows

Related Tools

More from today

When AI Code Takes 12 Minutes to Write and 10 Hours to Fix

77% of New Self-Help Books on Amazon Are Likely AI-Written

Anthropic's Mythos AI Found Zero-Days It Wasn't Trained to Find

Cookie Preferences