Models Notable

Xiaomi's mimo-v2.5 Pro Beats Claude Opus 4.5 in Coding, MIT-Licensed

April 29, 2026 2 min read

Image: Anthropic

Until this week, if you wanted a model that outperformed Claude Opus 4.5 at coding, you were paying for API access. That changed when Xiaomi's mimo-v2.5 pro landed at #9 on Arena's coding leaderboard - one spot above Opus 4.5 at #10.

Arena ranks models based on head-to-head human preference votes: real users compare two model outputs and pick the better one. It's not a fixed test set that models can be inadvertently trained against, which makes it a more honest signal than most academic benchmarks. The specific leaderboard here is the coding category with no style control, meaning the ranking reflects raw coding quality without adjusting for how responses are formatted or how long they run.

The MIT license is the bigger story. Open-weight models publish the actual model weights - the billions of numerical parameters that determine how the model thinks - so anyone can download and run them locally. MIT adds no restrictions on commercial use, meaning companies can build products on top of mimo-v2.5 pro without paying per-token fees or signing usage agreements.

Claude Opus 4.5 runs at $15 per million input tokens through Anthropic's API. Mimo runs on your own hardware at infrastructure cost. The GPU requirements to run a frontier-tier model locally are real and significant, but for teams with the capacity, that's a meaningful difference.

The coding-specific ranking also matters. General leaderboards reward fluent text generation. Arena's coding category judges actual code quality and correctness - which is the benchmark developers actually care about.

Open-weight models have been closing the capability gap with closed frontier models steadily for the past year. Mimo-v2.5 pro ranking above Opus 4.5 in coding is a concrete data point in that progression, not an outlier.

Related Tools

More from today

Glean Builds a Task-Specific AI Model for Enterprise Search

Mistral Medium 3.5 Launches with Open Weights but No Commercial Use

DeepSeek Starts Limited Testing of Image Understanding Capabilities

Cookie Preferences