Tools Notable

Six Places AI Builds Break in Production (And It's Rarely the Model)

June 5, 2026 3 min read

Two years ago, most teams building their first AI features expected the model to be the bottleneck - hallucinations, slow inference, cost per call. After two years of actually running AI in production, the honest answer looks different. The models are usually fine. The infrastructure around them isn't.

One team with two years of production AI deployments documented six places their builds consistently broke. The first, and most expensive to diagnose: the context layer didn't exist.

Context Gaps That Prompt Engineering Can't Fix

The scenario goes like this. You build a system that makes customer-facing recommendations. The model is capable - good at reasoning, clear in its outputs. Your prompts are carefully written. But the outputs are subtly wrong in ways that are hard to pin down: not grounded in how your specific business works, missing knowledge that any experienced employee would have. You spend weeks refining prompts to make the problem less visible.

What's actually missing is a context layer - the connection between the AI and your organization's specific knowledge. Who your customers are. How your products are configured. What exceptions your team makes and why. Prompting can't substitute for this. Prompting is about instruction and format; it doesn't give the model factual grounding it doesn't have.

The technical solution is often called RAG (retrieval-augmented generation) - a setup where the AI looks up relevant information from your own documents or databases before responding, rather than relying purely on what it learned during training. But the more important insight is recognizing when you're trying to solve a missing-context problem by improving prompts. The symptoms look similar: outputs that are fluent but off-target. The fix is completely different.

The team described the pattern directly: prompting didn't fill the gap, it just made it less obvious. A model giving recommendations based on generic knowledge looks almost identical to a model using your specific business context - until a customer catches it.

What the Two-Year View Reveals

The two-year timeframe matters. The first year of AI deployment is typically a feature race: get it working well enough to ship. The second year is when the real patterns surface - what costs more than expected, what breaks in ways testing didn't catch, where the maintenance burden actually lives.

Teams with two or more years in production consistently report that model quality is rarely the core issue for most business applications. The models improved dramatically between 2024 and 2026, and the major providers are now reliable enough that raw capability gaps are rarely what's holding deployments back.

The context layer problem bites new teams first because it's invisible at build time. In a controlled demo, you feed the model exactly the right context yourself. In production, the question of where that context comes from - reliably, with fresh data, at the right level of detail - is a separate engineering challenge that needs to be designed before the AI layer, not after it.

If you're early in an AI build and the outputs feel slightly off but you can't say exactly why, the first thing to check isn't the prompt. It's what the model actually knows about your specific situation.

Context Gaps That Prompt Engineering Can't Fix

What the Two-Year View Reveals

Related Tools

More from today

AI Tools Didn't Replace Your Coordination Work. They Redistributed It.

Building an iOS App With AI Is Easy. Shipping One People Want Is Not.

ChatGPT Starts Rolling Out Its Most Significant Memory Upgrade Yet

Cookie Preferences