Related ToolsChatgptClaude

Google's Gemma 4 Puts Agent-Capable AI on Your Laptop

Google DeepMind
Image: Google

Four models, all open, all capable of running AI agents on hardware you already own. That's the pitch behind Google's Gemma 4 family, released today.

The lineup covers a wide range of devices:

  • E2B and E4B - Two small dense models built for phones and laptops. Both handle text, image, and audio input natively, which is unusual for models this size. 128K token context window (roughly 300 pages of text).
  • A4B - A 26-billion parameter mixture-of-experts model (a design where only a portion of the model activates per query, keeping it fast) sized for consumer GPUs. 256K context window.
  • 31B dense model - The largest in the family, aimed at workstations and servers. Also 256K context.

Built for Agents, Not Just Chat

Every model in the Gemma 4 family supports function calling out of the box. Function calling is what lets an AI model actually do things - search the web, write to a database, control an app - rather than just generate text. Most local models need extensive setup to handle this. Gemma 4 ships with it built in.

Each model also includes a toggleable "thinking mode" for chain-of-thought reasoning (where the model works through a problem step by step before answering). You get structured reasoning without needing a separate, larger model.

What This Means for Local AI

The practical gap between cloud AI and local AI has been shrinking, but agent capabilities have lagged behind. Running a ChatGPT-style agent locally usually meant cobbling together function calling frameworks, dealing with inconsistent tool use, and accepting worse performance. Gemma 4 closes several of those gaps in a single release.

The E2B model running on a phone with native audio input is particularly interesting. That's a voice-capable AI agent running entirely on-device with no cloud round-trip, no API costs, and no data leaving the phone.

Google isn't the only one pushing local models - Meta's Llama and Microsoft's Phi families are both active - but Gemma 4's combination of multimodal input, native function calling, and a range of sizes from phone to server makes it the most agent-ready local model family available right now.