Related ToolsAiderCursorCodyContinue

Qwen3.6 35B Leads for Local AI Agent Workflows in Hands-On Testing

Qwen AI
Image: Alibaba Cloud

Most open-source models being tested for local agentic workflows - tasks where a model calls tools, chains actions, and follows multi-step plans autonomously - fall apart faster than they should. Qwen3.6 35B A3B is the current exception.

Hands-on testing across recent open-weight models, using IQ4_NL quantized builds (compressed versions of the model that fit on consumer hardware) from Unsloth via the Hermes Agent framework, put the comparison in blunt terms. Gemma4 produced broken tool calls. GLM 4.7 Flash REAP couldn't get past 2 or 3 message exchanges before entering a loop and becoming useless. Qwen3.6 35B A3B occasionally looped too, but that was its worst failure mode - it otherwise stayed on task.

The architecture helps explain the gap. Qwen3.6 35B is a Mixture of Experts model, a design where the model activates only a portion of its total parameters on each response - roughly 3.5 billion of its 35 billion parameters per step. That keeps inference fast and memory requirements manageable on a high-end consumer GPU or Apple Silicon with 32GB+ RAM, while still drawing on a much larger total model capacity for complex reasoning tasks.

The looping behavior that trips up Qwen3.6 during long agentic runs is a known weakness across most open-weight models. The model loses track of its current objective and starts repeating earlier steps. It's addressable through careful system prompt engineering, but it adds setup overhead.

For anyone running local coding assistants, automated research pipelines, or custom agents on private hardware, Qwen3.6 35B is the benchmark other models need to beat right now. The main open question is how the next wave of 30-40B parameter open-weight releases will compare as Alibaba and competitors push out new model generations.