Research

Why LLMs Can't Verify What They Know - And Why That Matters for Security

March 5, 2026 2 min read

What Happened

Chris McCormick published an analysis on March 4, 2026, examining how LLMs process and "know" information compared to humans. The core argument: LLMs have a fundamentally different epistemic architecture that makes them inherently vulnerable to deception.

McCormick uses a metaphor of a person sitting in a dark room, receiving information only through typed messages on a tickertape. That's roughly how LLMs experience the world - as a stream of tokens where everything arrives through the same channel with no way to cross-reference against other senses or direct experience.

Humans verify information through multiple channels. You can read that a stove is hot, but you can also see the red element, feel the radiant heat, and smell burning if you get too close. LLMs get none of that. Every piece of information - whether it comes from a trusted API response or a malicious prompt injection buried in a webpage - looks the same to the model.

McCormick suggests that encoding trustworthiness signals directly into software architecture, using Bayesian priors with high confidence, could eventually make LLMs more reliable. But that's speculative, and we're not there yet.

Why It Matters

If you use AI tools for anything that touches real decisions - drafting contracts, analyzing data, summarizing research - you need to understand this limitation. LLMs aren't just occasionally wrong. They lack the basic perceptual infrastructure to distinguish reliable information from unreliable information.

This has direct implications for agentic AI workflows. When you give Claude or ChatGPT access to browse the web, read emails, or process documents, every piece of input is a potential attack vector. The model literally cannot tell the difference between your legitimate instructions and a prompt injection hidden in a PDF attachment.

For teams building AI-powered automations, this isn't a theoretical concern. Prompt injection attacks exploit exactly this epistemic gap.

Our Take

This is a useful framing for something practitioners already feel intuitively: you can't fully trust LLM output, and adding more tools and capabilities to models also increases their attack surface.

The "dark room with a tickertape" metaphor is the clearest explanation I've seen of why prompt injection is so hard to fix. It's not a bug that can be patched. It's a structural limitation of how these models receive information.

The practical takeaway: keep humans in the loop for anything consequential, and don't pipe untrusted content directly into LLM prompts without sanitization. The models are useful, but they're working with a severe handicap when it comes to knowing what to trust.

What Happened

Why It Matters

Our Take

Related Tools

More from today

LLMs Can Identify Anonymous Users for $2 Each, ETH Zurich Study Finds

Study Shows AI Agents Can Link Anonymous Online Accounts to Real Identities

OpenAI's CoT-Control Study: Reasoning Models Can't Hide Their Thinking

Cookie Preferences