Three years ago, the pitch for large language models was simple: ask a question, get a smart answer tailored to your context. No more wading through ten blue links and SEO spam. Just knowledge, delivered cleanly.
Today, every major lab is sprinting in a different direction. The 2026 roadmap at OpenAI, Anthropic, Google, and Meta is dominated by agents - AI that can write code, browse the web, operate your computer, and chain together multi-step tasks. Coding benchmarks have replaced general knowledge as the metric everyone optimizes for. The leaderboards reward tool use and instruction following, not depth of understanding.
What Gets Optimized Gets Better (Everything Else Doesn't)
This is a real tradeoff, not just a marketing shift. Model parameters (the numerical weights that store everything a model "knows") are finite. Training compute spent teaching a model to call APIs, write Python, and recover from errors is compute not spent on making it more knowledgeable about medicine, law, history, or your specific industry.
The results show up in practice. Ask a current frontier model a nuanced question about tax law, structural engineering, or pharmaceutical interactions, and you will frequently get confident-sounding answers that are subtly wrong. The same model can write a working React component in seconds. The optimization target is visible in the output.
This is not an argument against agents - they are genuinely useful. But the balance has tilted hard. RAG systems (retrieval-augmented generation, where a model searches a knowledge base before answering) were supposed to fill the knowledge gap, but they add latency, complexity, and their own failure modes. They are a patch, not a fix.
Who Feels This Most
Professionals who came to AI for domain expertise - lawyers checking precedent, doctors reviewing drug interactions, analysts digging into niche markets - are the ones most underserved by the current trajectory. These users do not need their AI to book calendar meetings or commit code to GitHub. They need it to be deeply, reliably right about specific subjects.
Small language models tuned for specific domains (legal, medical, financial) do exist, but they lack the general reasoning ability that makes frontier models useful. You end up choosing between a model that knows a lot but reasons poorly, or one that reasons well but knows too little about your field.
Where the Gap Creates Opportunity
The labs chasing agent capabilities are leaving a real opening. A model that prioritized knowledge density and factual reliability over tool use would find a hungry audience among professionals, researchers, and anyone whose primary use case is "give me a trustworthy answer." Perplexity has carved out a niche here by wrapping search retrieval around model outputs, but even that approach inherits the underlying model's knowledge limitations.
The most likely path forward is specialization: general-purpose agent models for task execution, paired with domain-specific models or knowledge layers for accuracy-critical work. But right now, the industry is building hammers, and the people who need scalpels are waiting.