llama.cpp Merges Gemma 4 Tokenizer Fix to Main Branch

Meta Llama
Image: Meta

If you have been trying to run Google's Gemma 4 models locally through llama.cpp and getting garbled output, the fix just landed.

A tokenizer patch for Gemma 4 support has been merged into the main branch of llama.cpp, the popular open-source tool for running large language models on consumer hardware without needing cloud APIs. The tokenizer is the component that converts text into the numerical tokens a model actually processes. When it is broken for a specific model, outputs range from subtly wrong to completely unusable.

Gemma 4 is Google's latest open-weights model family, and llama.cpp is the most widely used runtime for running these models on personal machines with CPUs or consumer GPUs. A mismatch between how Gemma 4 encodes its vocabulary and how llama.cpp was parsing it meant the models could not run correctly. This merge resolves that.

For anyone running local models, this is a pull-and-rebuild fix. Update to the latest main branch, recompile, and Gemma 4 models should tokenize correctly. If you are using a frontend like LM Studio or Ollama that bundles llama.cpp under the hood, expect updated builds to roll out within a few days as those projects pull in the new code.