Models Notable

Google's Gemma 4 31B Runs Neck and Neck With Claude Sonnet 4.6 in Independent

June 8, 2026 2 min read

Image: Anthropic

Running a model competitive with frontier commercial APIs used to mean paying per token. That case keeps getting harder to make.

Independent benchmark results show Google's Gemma 4 31B, running in FP8 quantization, matching Claude Sonnet 4.6 across a personal evaluation harness. FP8 - 8-bit floating point precision - compresses a model's numerical weights to roughly half the GPU memory of standard 16-bit format. That's what makes a 31B parameter model (roughly 31 billion internal numerical connections the model uses to process text) practical on consumer hardware rather than requiring cloud compute.

What the Numbers Actually Show

The benchmark is one developer's private harness, not an official leaderboard, so treat it as a signal rather than a verdict. Personal harnesses built for specific use cases often surface differences that standardized benchmarks miss - they test what actually matters for the person running them.

The cost gap is what makes this notable. Gemma 4 31B is open-source and runs locally for free. Claude Sonnet 4.6 costs $3 per million input tokens through Anthropic's API. For teams running high-volume workflows - document processing, coding assistance, automated content pipelines - that gap compounds quickly.

The Practical Question

The honest framing isn't whether Gemma 4 31B is better than Sonnet 4.6 overall - it almost certainly isn't across every task. It's whether it's good enough for your specific workload.

Google's Gemma series has consistently outperformed expectations relative to its parameter count, and Gemma 4 continued that pattern. The FP8-quantized 31B version fits in 24GB of VRAM, which puts it within reach of consumer RTX 4090-class hardware.

For teams with data privacy requirements, high token volumes, or no appetite for API dependencies, results like these are worth testing against your own evaluation set. "Competitive on my harness" is not the same as "better" - but competitive and free is a combination worth running your own numbers on.

What the Numbers Actually Show

The Practical Question

Related Tools

More from today

DeepSeek V4 Pro Edges Past GPT-5.5 Pro on Precision

The More AI Tools You Add, the More Work Falls on You

When AI Handles the Code, You're Left With Harder Problems

Cookie Preferences