Mozilla just published a detailed look at the AI model selection behind Firefox's "Shake to Summarize" feature on iOS, and the reasoning is worth paying attention to if you care about how companies actually pick AI models for production use.
The feature itself is simple: shake your iPhone while browsing in Firefox, and you get an AI-generated summary of whatever page you're reading. A lightning bolt icon offers the same function without the wrist workout. It shipped in September 2025 and earned an honorable mention on Time Magazine's best inventions list that year.
The interesting part is the model bake-off Mozilla ran before choosing a winner.
$0.10 Per Million Tokens Won the Race
Mozilla tested five models: Mistral Nemo, Mistral-Small, Jamba 1.5 mini, Gemini Flash 2.0, and Llama 4 Maverick. They judged summaries on coherence, consistency, relevance, and fluency, using GPT-4o as an automated evaluator.
Mistral-Small came out on top, not because it produced the absolute best summaries, but because it hit the sweet spot between quality and cost. At $0.10 per million input tokens, it was dramatically cheaper than some competitors while scoring competitively on summary quality. Gemini 2.0 Flash and Llama 4 Maverick scored well too, but the cost-per-quality ratio favored Mistral-Small.
The open weights availability also mattered to Mozilla, which has long positioned itself as the privacy-and-openness alternative in the browser market.
How It Actually Works
The pipeline is straightforward: Firefox grabs the page content, sends it to Mistral-Small for summarization, and returns the result. There's a hard cap at 5,000 tokens (roughly 3,500 words of page text), which keeps response times fast and costs predictable.
This is a cloud-based feature, not on-device. Mozilla absorbs the inference costs entirely, meaning users pay nothing. That's a notable choice - running a cloud LLM for free on every shake adds up, even at $0.10 per million tokens. The 5,000-token limit starts to look less like a performance decision and more like a cost control measure.
The feature is currently limited to Firefox on iOS. No word on Android or desktop availability.
For anyone building products with LLM APIs, Mozilla's selection process is a useful case study. The fastest or highest-quality model didn't win. The one with the best cost-to-quality ratio did. At production scale, that math matters more than benchmark leaderboards.