If you downloaded Gemma 4 GGUF files in the past few weeks, you were running a broken chat template and may not have noticed. The bug has been fixed - but it requires downloading fresh files.
A quick explainer: GGUF is the file format used to run large language models locally on consumer hardware with tools like llama.cpp and Ollama. The "chat template" is the formatting wrapper that tells the model it's in a back-and-forth conversation rather than just autocompleting text. Get it wrong and the model may ignore your system prompt, respond out of turn, or produce garbled output - problems that look like the model is dumb when the file is actually malformed.
The fix covers all three Gemma 4 quantization variants from bartowski's HuggingFace repositories:
- 31B instruction-tuned - the full 31-billion parameter version
- 26B-A4B - a mixture-of-experts model where only 4 billion parameters activate per request (faster without sacrificing much quality)
- E4B - the smallest, efficiency-focused variant
If you pulled Gemma 4 through Ollama, check when your local version was downloaded relative to the fix date and re-pull if you're on a pre-fix build. Symptoms of the broken template include the model not following conversation structure or ignoring multi-turn context.