Models

How One Company Cut LLM Costs by Upgrading to a More Expensive Model

March 20, 2026 2 min read

Spending less money by using a more expensive AI model sounds like a contradiction. Mendral, an AI DevOps platform that automatically diagnoses CI build failures, pulled it off by rethinking how they route work between models.

The company processes roughly 4,000 CI failures per week. Previously, every one of those went through Anthropic's Sonnet 4.0. Now they use a tiered system: a cheap Haiku model acts as a triager, handling the 80% of failures that are duplicates or known issues. Only the remaining 20% get escalated to Opus 4.6, Anthropic's most capable (and most expensive) model.

The math is straightforward. A triager match costs about 25x less than a full investigation. Since 4 out of 5 failures never reach the expensive model, total spending dropped below what they were paying when everything ran on the mid-tier Sonnet.

The Architecture Worth Stealing

The interesting detail isn't the cost savings - it's the design pattern. Haiku handles 65% of all input tokens but accounts for only 36% of LLM spending. Its input-to-output ratio is 86:1, meaning it reads extensively but returns only focused extracts. The expensive Opus model does the actual thinking on the hard cases.

Mendral also stopped stuffing raw logs into prompts. Instead, they gave their agents SQL access to a ClickHouse database, letting the models pull exactly the data they need rather than processing massive log dumps.

This "cheap model reads, expensive model thinks" pattern is worth paying attention to if you're building anything that processes repetitive inputs. Most workflows have a similar distribution where the majority of cases are routine. Routing those to a $0.25/million-token model instead of a $15/million-token model adds up fast, especially when the cheap model is good enough to recognize what it's already seen before.

The Architecture Worth Stealing

Related Tools

More from today

Runway and Nvidia Demo Real-Time AI Video Generation Under 100ms

Nvidia's NemotronH Silently Rewrites Answers Instead of Refusing Them

AI Coding Test: Claude Scores +854 While ChatGPT Hits -74,383

Cookie Preferences