Research

The AI Risk Nobody Talks About: Systems That Optimize Bad Assumptions

May 13, 2026 2 min read

What happens when an AI system is very good at its job, but its job is based on a flawed picture of reality?

Most serious AI risk discussions focus on the wrong timeline. The concern gets framed as: AI becomes smarter than humans, we lose control, things go badly. That's a real research area, and smart people are working on it. But there's a more immediate version of AI risk that's already happening and gets far less attention.

AI systems are becoming very effective at optimizing for metrics. The problem is that metrics are always simplified stand-ins for what we actually care about. When those simplifications are wrong, getting better at hitting the metric makes the underlying problem worse.

The Gap Between What You Measure and What You Want

A hiring tool trained to predict "successful employees" learns from historical data about who the company previously hired and promoted. If that history reflects bias - if certain kinds of people were systematically excluded or passed over - the tool encodes that bias as signal. It doesn't "understand" human potential. It optimizes a compressed representation of what success looked like before, then applies that pattern efficiently to every new applicant.

The same dynamic shows up in content recommendation. An algorithm that maximizes "time on platform" discovers, entirely through optimization rather than intent, that outrage and anxiety keep users engaged longer than calm, informative content. It doesn't have a goal of making users anxious. It has a goal of time-on-platform, and anxious users deliver that metric.

Credit scoring, medical triage, predictive policing, school district resource allocation - anywhere you put an AI system in a decision loop, you're asking it to optimize a proxy metric in place of a human judgment call. The proxy is never perfect. The more efficient the optimization, the more the gaps between proxy and reality get exploited.

Better Models Don't Automatically Fix This

The troubling part is that this problem doesn't automatically improve as AI systems get more capable. A smarter hiring tool is better at finding patterns in flawed data. A better recommendation algorithm is better at exploiting psychological vulnerabilities it doesn't know it's targeting. More capability applied to a flawed objective produces flawed outcomes more efficiently.

You don't need superintelligence for AI systems to cause significant harm. You just need systems that are good enough at optimization, deployed in consequential decisions, and measured only on whether they hit their proxy metric.

The corrective isn't simple. It requires ongoing human oversight of what metrics AI systems are actually being rewarded for, real measurement of outcomes beyond the proxy, and genuine accountability when the gap between metric and reality causes harm. Those are harder to ship than a model update.

The Gap Between What You Measure and What You Want

Better Models Don't Automatically Fix This

More from today

Claude Suspects It's Being Tested 26% of the Time - And Never Says So

Developers Say AI Coding Tools Are Eroding Their Problem-Solving Instincts

New Research Questions Whether Data Cleaning Always Improves ML Models

Cookie Preferences