Related ToolsCursorCodyAiderContinueClaude Code

JetBrains Releases Mellum2: Open-Source 12B Model for AI Agent Pipelines

JetBrains Releases Mellum2: Open-Source 12B Model for AI Agent Pipelines
Image: Hugging Face Blog

JetBrains released Mellum2 on June 1, an open-source 12-billion parameter coding model built for AI agent pipelines rather than standalone chat.

The model uses a Mixture-of-Experts (MoE) architecture - a design where only a fraction of total parameters activate per request. Mellum2 has 12B parameters total, but only 2.5B are "active" per token (per word-piece the model processes). That selective activation is why JetBrains claims inference speeds more than 2x faster than standard models of the same total size, at lower compute cost per request. For teams running AI pipelines at volume, the difference in cost per request adds up fast.

JetBrains is explicit about the positioning: this is not a replacement for frontier models like Claude or GPT-4o. It's designed as a cheaper, faster layer for repetitive tasks inside larger AI systems - the steps where you don't need frontier intelligence but need something fast and reliable.

The four use cases JetBrains targets:

  • Routing and orchestration: classifying which tool or agent handles each step in a pipeline
  • RAG pipelines: context compression and retrieval post-processing (RAG, or retrieval-augmented generation, is a technique where the AI pulls relevant documents before answering)
  • Sub-agent tasks: planning passes, validation checks, data transformation between agents
  • Private deployment: self-hosted environments where proprietary code can't be sent to external APIs

The private deployment case is where this matters most practically. Running a 2.5B active-parameter model locally for routing and orchestration - while reserving API calls to GPT-4o or Claude for complex reasoning - cuts both cost and data exposure. More engineering teams are building this two-tier pattern as agentic workflows scale beyond prototypes.

Mellum2 is available on Hugging Face under an Apache 2.0 license, meaning free for commercial use without royalties. A technical report on arXiv includes benchmark comparisons against similarly-sized open models on code generation, reasoning, math, and science tasks.

JetBrains frames the competition narrowly: not "can this outperform GPT-4o on hard coding problems" but "can this outperform other small models on speed while staying accurate enough for orchestration tasks." That's an honest framing, and probably the right one for how teams will actually use it.