Related ToolsChatgptClaude

Meta Has Built Four Custom AI Chips in Two Years to Cut Its GPU Dependency

Four MTIA Chips in Two Years: Scaling AI Experiences for Billions
Image: Meta

Hundreds of thousands of custom silicon chips are already running inside Meta's data centers, and four generations arrived in roughly two years. That pace tells you how seriously Meta is trying to reduce its dependence on Nvidia for the AI workloads behind Facebook, Instagram, and WhatsApp.

The chip family is called MTIA (Meta Training and Inference Accelerator), and Meta published a detailed roadmap this week covering generations 300 through 500. The numbers paint a clear picture of where the company is headed.

From Recommendations to Running Llama

MTIA started as a chip purpose-built for ranking and recommendation models, the algorithms that decide what shows up in your feed. MTIA 300, the first generation in production, handles training for those workloads today. It uses a chiplet design (multiple smaller chips packaged together) with dedicated networking built in.

MTIA 400 is where things shift. Meta redesigned it to also run generative AI models, including large language models like Llama. The company claims it delivers "competitive" performance against leading commercial products (read: Nvidia GPUs). The specs back that up on paper: 400% more FP8 compute (a low-precision math format used for AI inference) and 51% more memory bandwidth than the 300.

MTIA 450 and 500, both scheduled for 2027, go all-in on generative AI inference, meaning running trained models at scale rather than training new ones. The 450 doubles memory bandwidth again and adds 6x more compute for ultra-low-precision math (MX4 format). The 500 pushes bandwidth another 50% higher with up to 80% more memory capacity.

From the 300 to the 500, total memory bandwidth grows 4.5x and raw compute jumps 25x.

One New Chip Every Six Months

The real story here is velocity. Meta is shipping a new chip generation roughly every six months. For context, Nvidia's data center GPU cadence is roughly 18-24 months between major generations. Meta can move faster because these are narrower chips - they do not need to be general-purpose GPUs that handle gaming, scientific computing, and AI. They just need to run Meta's models on Meta's infrastructure.

The software side matters too. MTIA plugs into PyTorch natively (Meta's own machine learning framework), supports vLLM (a popular system for serving language models efficiently), and uses Triton for custom operations. That means engineers do not need to rewrite their code to run on MTIA instead of a GPU. Frictionless adoption is the explicit goal.

What This Means for the GPU Market

Meta is one of Nvidia's biggest customers. Every workload Meta moves to MTIA is revenue Nvidia does not get. Google has been doing this with its TPU chips for years, and Amazon has its Trainium and Inferentia lines. Meta joining the custom silicon race at this pace - with chips already deployed at "hundreds of thousands" scale - adds real pressure.

But this is not a full GPU replacement. Meta still uses Nvidia hardware extensively for training its largest models. MTIA is focused on inference, which is where the volume is. Training a model happens once; serving it to billions of users happens millions of times per second. That is where custom chips save the most money.

For anyone building on top of Meta's platforms or using Llama models, the practical impact is indirect but real: lower infrastructure costs for Meta means more resources for model development and potentially cheaper API access down the line.