Open Source

Qwen 3.5 Overthinking Problems May Be a Settings Issue, Not a Model Flaw

March 22, 2026 2 min read

Image: Alibaba Cloud

A growing number of Qwen 3.5 users have been complaining that the 35B and 27B parameter models get stuck in extended reasoning loops, burning through tokens (the units of text a model processes) without producing useful output. But the problem might not be the model - it might be how people are running it.

Practitioners who've avoided the overthinking issue report that careful configuration makes all the difference. The key settings: keeping temperature low (the randomness dial - lower means more focused, predictable output), using clear system prompts that explicitly tell the model to be concise, and avoiding open-ended instructions that give the reasoning engine too much room to spiral.

This pattern shows up regularly with reasoning models (models specifically trained to "think" before answering, like OpenAI's o1 series or DeepSeek R1). They're designed to spend more compute on harder problems, but without guardrails, that same capability turns into a liability. The model keeps reasoning because nothing told it to stop.

Qwen 3.5, released by Alibaba's cloud division, has been one of the more impressive open-weight model families for running locally. The 35B version fits on consumer GPUs with quantization (a compression technique that reduces the model's memory footprint at a small quality cost), making it popular with developers and hobbyists who want strong reasoning without paying per-token API fees.

The practical fix is straightforward: set a reasoning token budget if your inference software supports it, keep temperature at or below 0.7, and front-load your prompts with specific output constraints. Some users also report that switching from the default chat template to a custom one that includes an explicit "be brief" instruction in the system message eliminates the looping entirely.

For anyone running local models, this is a useful reminder that model quality and deployment quality are two different things. A model that benchmarks well can still perform badly if the serving configuration doesn't match its design assumptions.

Related Tools

More from today

Alibaba Reaffirms Open-Source Commitment for Qwen and Wan Model Lines

MiniMax M2.7, a Frontier-Class Reasoning Model, Is Going Open Weights

Sashiko: AI Code Reviewer Catches 53% of Linux Kernel Bugs Humans Missed

Cookie Preferences