Google's next generation of open-weight models appears to be close to launch. Gemma 4 has been spotted testing on Arena (the LLM benchmarking platform where models compete head-to-head) under the codename "significant-otter," and the model self-identifies as "Gemma 4, a large language model developed by Google DeepMind" when asked.
Three sizes are planned: 2B, 4B, and 120B parameters. That's a significant jump from Gemma 3, which topped out at 27B. The 120B variant would put Gemma in direct competition with much larger open-weight models like Meta's Llama family.
What We Know So Far
The details are thin. No benchmarks, no architecture papers, no official announcement from Google yet. The model's existence was confirmed through Arena testing, where users interacted with it and extracted its self-identification. There's speculation about multimodal capabilities (meaning the model could process images and text, not just text), but nothing confirmed.
The jump to 120B parameters is the headline. Gemma has carved out a niche as Google's "small but capable" open-weight offering, popular with developers running models locally on consumer hardware. A 2B and 4B model fit that story. A 120B model is a different proposition entirely - that requires serious GPU hardware to run and positions Google to compete at the frontier of open-weight performance.
For anyone running local AI models, the 2B and 4B variants are the ones to watch. These will likely run on a single GPU or even a laptop, making them candidates for offline coding assistants, local chatbots, or embedded applications. The 120B model is more relevant to companies building products on top of open-weight infrastructure.
No release date has been announced, but "imminent" is the word circulating. Given that the model is already being tested on Arena, a public release within days or weeks seems likely rather than months.