Related ToolsCursorCodyClaude CodeContinue

StepFun's 3.7 Flash Beats Gemini and DeepSeek on Coding Benchmarks, Runs on 128GB RAM

Google Gemini
Image: Google

56.26% on SWE-Bench Pro - that's Step 3.7 Flash's score on the benchmark that tests AI models by having them fix actual bugs from real GitHub repositories. It beats DeepSeek V4 Flash (55.6%) and Gemini Flash 3.5 (55.1%), and it runs entirely on local hardware.

Step 3.7 Flash comes from StepFun, the Chinese AI lab best known for its video generation work. The model uses a Mixture of Experts (MoE) architecture, meaning it has 196 billion parameters total but only activates 11 billion for any given task. Think of it like a consulting firm: instead of every specialist working on every problem, only the relevant ones engage. This lets the model perform competitively without running all 196 billion parameters at once, which is why it fits on a machine with 128GB of RAM.

Vision capabilities come from a built-in 1.8 billion parameter Vision Transformer - a specialized neural network for processing images - integrated directly into the model rather than added as a separate system.

How It Compares Against Flash-Tier Models

  • SWE-Bench Pro: 56.26% (vs. DeepSeek V4 Flash 55.6%, Gemini Flash 3.5 55.1%)
  • DeepSearchQA F1: 92.82% (vs. GPT 5.5 at 93.98%)
  • HLE with tools: 47.2% (HLE is a PhD-level question set designed to resist pattern-matching; it measures genuinely hard reasoning, not recall)

The DeepSearchQA result stands out most. A 1.16-point gap behind GPT 5.5 on a search-intensive reasoning benchmark is narrow enough that most document-analysis or research workflows won't notice it.

Local Deployment and Data Privacy

Most models at this quality level require sending your requests to a company's API, meaning data leaves your infrastructure on every call. That's a non-starter for legal teams, healthcare applications, or any team with data residency requirements.

A 196B MoE model that runs on 128GB RAM puts capable multimodal processing (text and images handled by one system) within reach of high-end workstations and Mac Studio configurations without cloud dependency. Development teams that have been using smaller local models because frontier alternatives weren't locally deployable now have an option that benchmarks ahead of cloud-hosted competitors on coding tasks specifically.

StepFun hasn't published detailed pricing for local deployment at time of writing.