Google just dropped Gemma 4, a family of four open-weight models that punch well above their size class on public benchmarks.
The lineup spans a wide range of hardware targets:
- E2B (2B parameters) - Built for phones, Raspberry Pi, and IoT devices
- E4B (4B parameters) - Lightweight mobile deployment
- 26B (Mixture of Experts) - A mid-range option that uses MoE architecture, meaning only a portion of the model activates for any given task, keeping it fast despite the parameter count
- 31B (Dense) - The flagship, where every parameter fires on every request
The headline numbers: the 31B and 26B models ranked 3rd and 6th respectively on Arena AI's text leaderboard, beating models with 20 times as many parameters. Google claims up to 4x speed improvements and 60% lower battery consumption compared to previous Gemma versions.
What's Actually New
All four models handle text, images, video, and audio natively. Gemma 3 added image understanding, but Gemma 4 extends this to full multimodal input including video and audio processing. The models also support structured outputs and function calling out of the box, making them practical for agent-style applications where the model needs to interact with external tools.
The Apache 2.0 license is the same permissive terms as Gemma 3 - no usage restrictions, no vendor lock-in, full commercial use allowed. Models are available on Hugging Face, Kaggle, Google AI Studio, and through PyTorch and JAX.
One detail for Android developers: Gemma 4 models form the foundation for Gemini Nano 4, so code targeting Gemma 4 will work on Gemini Nano 4-enabled devices arriving later this year.
For anyone running local models, the E2B and E4B variants are the most interesting part. A genuinely capable multimodal model that runs on a phone with near-zero latency opens up use cases that cloud-only models simply cannot serve - offline processing, privacy-sensitive applications, and real-time on-device inference without per-token API costs.