Open Source

Gemma 4 Chat Template Updated With Preserve Thinking Support

June 8, 2026 1 min read

Google's Gemma 4 chat template now supports "preserve thinking" - a change that keeps the model's internal reasoning visible across multiple conversation turns instead of discarding it between messages.

Gemma 4 includes a thinking mode where the model works through extended chain-of-thought reasoning before producing its answer (essentially, an internal scratchpad it uses to reason through problems before giving its final reply). Previously, most chat template implementations stripped those thinking tokens when formatting the next turn, so the model couldn't reference its earlier reasoning when responding to follow-up messages. The updated template fixes this by carrying the full reasoning trace forward in context.

The practical difference shows up on tasks that span multiple exchanges - debugging a problem over several messages, refining an analysis step by step, or iterating on a plan. These all benefit when the model can build on its earlier reasoning rather than treating each reply as a fresh start.

The update applies to the chat template format used by local inference tools like llama.cpp and Ollama. If you're running Gemma 4 on your own hardware, check that your inference framework has pulled in the updated template configuration.

More from today

Active Malware Campaign Targets Claude Code Users via Compromised npm Packages

The More AI Tools You Add, the More Work Falls on You

When AI Handles the Code, You're Left With Harder Problems

Cookie Preferences