For five years, Nvidia's story has been simple: buy more GPUs. The company's graphics processors became the default hardware for training and running AI models, and revenue followed. But at GTC 2026, kicking off March 16 in San Jose, Nvidia is pushing a different chip into the spotlight: the CPU.
The reason comes down to how AI is actually being used right now. Agentic AI systems - where an AI model breaks a task into steps, spawns multiple sub-agents, and coordinates their work - generate far more compute demand than a single chatbot query. And that demand doesn't all land on the GPU.
The Bottleneck Nobody Talked About
"CPUs are becoming the bottleneck in terms of growing out this AI and agentic workflow," said Dion Harris, Nvidia's head of AI infrastructure. The logic is straightforward: while GPUs handle the heavy math of running AI models, CPUs manage the orchestration layer - routing requests, managing memory, coordinating the dozens of agents that a single user query might spin up.
Jensen Huang put numbers to the problem on Nvidia's last earnings call: "These agentic systems are spawning off different agents working as a team. The number of tokens that are being generated has really, really gone exponential, and so we need to inference at a much higher speed."
Nvidia launched its first data center CPU, Grace, back in 2021. The next generation, called Vera, is now in production. At GTC, the company is expected to show a CPU-only rack on the showroom floor - a visual statement about where it sees the market heading.
Vera Rubin: The GPU Side Isn't Standing Still
The CPU story doesn't mean Nvidia is backing off GPUs. The conference's hardware centerpiece is the Vera Rubin platform, successor to the Blackwell architecture that's been selling as fast as Nvidia can manufacture it. Vera Rubin reportedly packs 336 billion transistors, moves to HBM4 memory (the latest high-bandwidth memory standard), and is projected to deliver 3.3x to 5x better performance than Blackwell for the "mixture of experts" model architecture that most large AI companies now use.
Further out, Nvidia is expected to give the first architectural details on Feynman, a future chip designed specifically for the reasoning and long-term memory demands of AI agents.
What This Means for the Tools You Actually Use
If you use ChatGPT, Claude, Perplexity, or any AI tool that chains together multiple steps to answer a question, you're already using agentic AI. The CPU bottleneck that Nvidia is addressing directly affects how fast and how cheaply those tools can run.
More efficient inference hardware (the chips that run trained models, as opposed to training them) typically translates to lower API costs and faster response times for end users. When Nvidia's Dion Harris describes a "platform agnostic" CPU strategy - meaning Nvidia's GPUs will work alongside CPUs from other manufacturers too - that signals a competitive infrastructure market, which historically pushes prices down.
GTC 2026 runs March 16-19 with 30,000 attendees from 190 countries and over 700 sessions. The keynote speaker lineup includes CEOs from Dell, Perplexity, Cohere, Mistral AI, CoreWeave, and Palantir. Jensen Huang delivers the opening keynote Monday at 11 a.m. PT.