Open Source

Gemma 4 GGUF Files Had a Broken Chat Template - Updated Builds Are Out

May 4, 2026 1 min read

If you downloaded Gemma 4 GGUF files in the past few weeks, you were running a broken chat template and may not have noticed. The bug has been fixed - but it requires downloading fresh files.

A quick explainer: GGUF is the file format used to run large language models locally on consumer hardware with tools like llama.cpp and Ollama. The "chat template" is the formatting wrapper that tells the model it's in a back-and-forth conversation rather than just autocompleting text. Get it wrong and the model may ignore your system prompt, respond out of turn, or produce garbled output - problems that look like the model is dumb when the file is actually malformed.

The fix covers all three Gemma 4 quantization variants from bartowski's HuggingFace repositories:

31B instruction-tuned - the full 31-billion parameter version
26B-A4B - a mixture-of-experts model where only 4 billion parameters activate per request (faster without sacrificing much quality)
E4B - the smallest, efficiency-focused variant

If you pulled Gemma 4 through Ollama, check when your local version was downloaded relative to the fix date and re-pull if you're on a pre-fix build. Symptoms of the broken template include the model not following conversation structure or ignoring multi-turn context.

More from today

llama.cpp Adds Multi-Token Prediction in Beta, Targeting Faster Local AI

Developer Builds Rewind-and-Replay Fix for Claude Code's Stale Context Problem

Cerebras Eyes $26.6B IPO Valuation Built on Its OpenAI Partnership

Cookie Preferences