Models Notable

Xiaomi's MiMo-V2 Matches Top AI Coding Benchmarks at 3.5% of the Price

March 23, 2026 2 min read

$0.10 per million input tokens. That is what Xiaomi is charging for MiMo-V2-Flash, an open-source coding model that just hit 73.4% on SWE-Bench, a benchmark that tests whether AI models can actually fix real bugs in real codebases. That score makes it the top-performing open-source model on the benchmark, and it costs roughly 3.5% of what you would pay for Claude Sonnet to do similar work.

The bigger model, MiMo-V2-Pro, is even more interesting. It ranks third globally on agent benchmarks (tests that measure how well a model can use tools, browse the web, and complete multi-step tasks), sitting just behind Anthropic's Claude Opus 4.6. It comes with a 1 million token context window - enough to feed it roughly 2,500 pages of text in a single conversation. Pricing: $1 per million input tokens, $3 per million output tokens. Opus charges $5 and $25 respectively for comparable performance.

The DeepSeek Connection

This did not come out of nowhere. MiMo's lead researcher previously worked at DeepSeek, the Chinese lab whose models shook up the industry earlier this year with strong performance at low cost. Xiaomi, better known for smartphones and consumer electronics, is applying the same playbook: train efficient models, release them cheap, and let the market react.

The Pro model quietly spent time on OpenRouter's anonymous arena before anyone knew who made it, which means it was being rated blind against models from OpenAI, Anthropic, and Google. The scores held up.

What This Means for Your AI Budget

The practical math here is hard to ignore. If you are running AI-assisted coding workflows, code review pipelines, or any agent-based automation, the cost difference between MiMo-V2-Pro and Claude Opus is roughly 5x on input and 8x on output. For a team processing millions of tokens per day, that is the difference between a $3,000 monthly bill and a $500 one.

There are real caveats. Benchmark scores do not always translate to production reliability. Opus still leads on the hardest agent tasks. And Xiaomi's model ecosystem is young - tooling, documentation, and community support are not at the level of OpenAI or Anthropic yet.

But the trend line is clear. Six months ago, matching frontier model performance required frontier model pricing. That assumption is breaking down fast, and it is Chinese labs driving the wedge. Western AI companies built their business models around the idea that top-tier intelligence commands premium pricing. Every model like MiMo-V2 that narrows the performance gap while widening the price gap puts more pressure on that thesis.

For anyone building AI into their products or workflows, the takeaway is simple: benchmark your actual use cases against these cheaper alternatives before auto-renewing expensive API contracts. The performance gap is shrinking faster than the price gap.

The DeepSeek Connection

What This Means for Your AI Budget

Related Tools

More from today

ChatGPT Users Report the Model Is Pushing Back More on Simple Statements

Claude Desktop vs Claude Code: Same Model, Very Different Behavior

Claude Can Catch Fake Legal Citations but Still Cannot Tell You What Day It Is

Cookie Preferences