Related ToolsClaude CodeCursorCodyAiderContinue

Qwen3-Coder-Next Tops SWE-rebench at Pass@5 With Only 3B Active Parameters

Qwen AI
Image: Alibaba Cloud

What Happened

Alibaba's Qwen team has a model sitting at the top of the SWE-rebench leaderboard that most people seem to have overlooked. Qwen3-Coder-Next, an 80B Mixture-of-Experts model with only 3B active parameters, holds the highest Pass@5 score on SWE-rebench at 64.6%.

Here is how the leaderboard looks right now:

Model Pass@1 Pass@5
Claude Code 52.9% 70.8%
Claude Opus 4.6 51.7% 58.3%
GPT-5.2 variants 51.0-51.7% 58.3-60.4%
Gemini 3 Flash 46.7% 54.2%
Qwen3-Coder-Next 40.0% 64.6%

The model was released on February 3, 2026, as open-weight, meaning anyone can download and run it locally. Its sparse MoE architecture means that despite having 80B total parameters, only 3B activate per inference pass, making it feasible to run on consumer hardware. The cost per problem on SWE-rebench sits at just $0.49.

Why It Matters

Pass@5 measures whether a model can solve a problem within five attempts. This metric matters more than Pass@1 for real-world coding agent workflows, where retry loops are standard practice. Tools like Cursor, Aider, and Claude Code already implement automatic retry logic. If your agent gets five shots at a bug fix and lands it 64.6% of the time, that is a usable tool.

The 3B active parameter count is the real headline. Running a competitive coding model locally means no API costs, no rate limits, no sending proprietary code to external servers. For teams with strict data policies or developers who burn through API credits on iterative coding tasks, a local model that performs at this level changes the math entirely.

It also pressures the pricing of proprietary models. If an open-weight model can match or beat GPT-5 and Gemini 3 Flash on multi-attempt coding tasks, the value proposition of paying per-token for those APIs narrows significantly.

Our Take

The gap between Pass@1 (40.0%) and Pass@5 (64.6%) tells you something important about this model: it benefits enormously from multiple attempts. It is not the most precise model on its first try - it ranks 16th on Pass@1 - but it generates diverse enough solutions that one of five usually works. That is a specific strength, not a general one.

For agentic coding workflows where the system automatically retries, this is a legitimate contender. For interactive coding assistants where you want the right answer on the first suggestion, Claude Code's 52.9% Pass@1 is still the stronger choice.

The practical question is whether local inference speed makes up for the accuracy gap. If you can run five attempts locally in the time it takes to make one API call and wait for rate limits to clear, Qwen3-Coder-Next wins on wall-clock time despite lower per-attempt accuracy.

Watch for this model to show up as a backend option in open-source coding tools over the next few weeks. It slots in perfectly for developers who want to keep their code local while still getting competitive agent-level performance.