Related ToolsChatgptClaude

Google Releases Gemma 4: Four Open Models Under Apache 2.0 License

Google DeepMind
Image: Google

Google dropped four new open-weight AI models yesterday under the Gemma 4 family, and the biggest change has nothing to do with performance: they're all Apache 2.0 licensed now. That means any developer or company can use them commercially without the restrictions that limited previous Gemma releases.

The Four Models

Gemma 4 ships in four sizes, each targeting a different use case:

  • 31B Dense - The flagship. Runs unquantized on a single 80GB H100 GPU, or at 4-bit precision on consumer cards like the RTX 4090. Ranked #3 among open models on the Arena AI text leaderboard. 256K token context window (roughly 600 pages of text).
  • 26B Mixture of Experts (MoE) - Uses 128 specialized sub-networks but only activates 3.8 billion parameters per token, making it significantly faster than the 31B while still ranking #6 on Arena. Also 256K context.
  • E4B (Effective 4B) - Actually 8 billion parameters compressed to behave like 4 billion. Built for phones and edge devices with a 128K context window.
  • E2B (Effective 2B) - 5.1 billion parameters compressed down, three times faster than the E4B. Designed for on-device use with up to 60% less battery drain than previous Gemma versions.

The MoE architecture deserves a quick explanation: instead of running every input through the entire model, it routes each piece of text to a small group of specialized "experts" within the network. You get near-big-model quality at a fraction of the compute cost.

What's Actually New

Beyond the licensing shift, Gemma 4 adds native function calling (the model can trigger external tools and APIs on its own), structured JSON output, and multimodal input support for video and audio alongside text. All four models support 140+ languages.

The function calling piece matters most for anyone building AI agents or automated workflows. Previous Gemma models needed extra engineering to connect with outside tools. Now it's built in.

Google is clearly positioning these against Chinese open models from Alibaba's Qwen and others, offering a domestic alternative for enterprises wary of data collection concerns. The Apache 2.0 licensing move puts Gemma on equal footing with Meta's Llama on the legal side, removing one of the biggest objections enterprise buyers had.

Where to Get Them

All four models are available now on Google AI Studio, Hugging Face, Kaggle, and Ollama, with day-one support for popular inference frameworks including vLLM, SGLang, Llama.cpp, and MLX. The edge models also run through Google's AI Edge Gallery for mobile developers.

The 31B model on a consumer GPU is the most interesting option for individual developers. At 4-bit quantization on a $1,600 RTX 4090, you're getting a top-3 open model running locally with no API costs and no data leaving your machine. That's a meaningful shift from even six months ago.