671 billion to 26 billion. That's how far AI model sizes have dropped in a single year while delivering broadly comparable results.
In January 2025, DeepSeek R1 launched with 671 billion parameters using a MoE architecture. MoE stands for Mixture of Experts - instead of all of a model's processing capacity activating for every query, it routes each input through only the relevant specialist subsets. The practical effect: a model that is technically large but only uses a fraction of its parameters at once, making it cheaper to run than a comparably-sized standard model.
Google's Gemma 4, released this week, uses the same MoE approach at just 26 billion total parameters. Early testing from the developer community is describing it as genuinely impressive - a word that was not being applied to 26B models a year ago.
What Running a 26B Model Actually Requires
DeepSeek R1 at 671B needed serious infrastructure. Multi-GPU server rigs, cloud compute, or accepting the quality trade-offs of the distilled smaller versions (the 8B and 14B variants most people actually ran locally). The full model was out of reach for individuals.
A 26B MoE model fits in a different category. An NVIDIA RTX 4090 with 24GB of video memory can handle it. Developers with a recent high-end workstation can run Gemma 4 locally without cloud access and without any data leaving their machine.
For teams handling sensitive documents, or individuals who simply don't want their prompts routed through someone else's servers, the distinction between "requires a data center" and "runs on my desk" is not academic.
The Quality Gap Is Closing Faster Than Expected
A year ago, the unspoken trade-off in local AI was real: you could run it yourself, but you'd notice where it fell short. Frontier models were measurably better at complex reasoning, long documents, and consistent instruction-following.
That gap hasn't closed entirely. Gemma 4 at 26B is not GPT-4.5. But the degree of compromise required to run local AI has dropped significantly, and the tasks where a 26B model falls short are narrower than they were 12 months ago.
Whether Gemma 4 holds up on what practitioners actually care about - multi-step coding, document analysis, reliable instruction-following across a conversation - will be clearer within weeks as real-world testing accumulates. Parameter counts alone don't tell the full story. But going from 671B to 26B in one year while the quality conversation stays competitive is a trajectory worth paying attention to.