Open Source Notable

Google Releases Gemma 4 Open Models That Beat Systems 20x Their Size

April 3, 2026 2 min read

Image: Google

Google just dropped Gemma 4, a family of four open-weight models that punch well above their size class on public benchmarks.

The lineup spans a wide range of hardware targets:

E2B (2B parameters) - Built for phones, Raspberry Pi, and IoT devices
E4B (4B parameters) - Lightweight mobile deployment
26B (Mixture of Experts) - A mid-range option that uses MoE architecture, meaning only a portion of the model activates for any given task, keeping it fast despite the parameter count
31B (Dense) - The flagship, where every parameter fires on every request

The headline numbers: the 31B and 26B models ranked 3rd and 6th respectively on Arena AI's text leaderboard, beating models with 20 times as many parameters. Google claims up to 4x speed improvements and 60% lower battery consumption compared to previous Gemma versions.

What's Actually New

All four models handle text, images, video, and audio natively. Gemma 3 added image understanding, but Gemma 4 extends this to full multimodal input including video and audio processing. The models also support structured outputs and function calling out of the box, making them practical for agent-style applications where the model needs to interact with external tools.

The Apache 2.0 license is the same permissive terms as Gemma 3 - no usage restrictions, no vendor lock-in, full commercial use allowed. Models are available on Hugging Face, Kaggle, Google AI Studio, and through PyTorch and JAX.

One detail for Android developers: Gemma 4 models form the foundation for Gemini Nano 4, so code targeting Gemma 4 will work on Gemini Nano 4-enabled devices arriving later this year.

For anyone running local models, the E2B and E4B variants are the most interesting part. A genuinely capable multimodal model that runs on a phone with near-zero latency opens up use cases that cloud-only models simply cannot serve - offline processing, privacy-sensitive applications, and real-time on-device inference without per-token API costs.

What's Actually New

Related Tools

More from today

Critical OpenClaw Flaw Gave Attackers Silent Admin Access to AI Agents

Six Behavioral Rules to Stop AI Coding Agents From Cutting Corners

Google's Gemma 4 Has a VRAM Problem That Makes It Hard to Run Locally

Cookie Preferences