Related ToolsClaudeClaude Code

The Haiku-Before-Sonnet Pattern Can Cut Claude API Costs by 80%

Claude by Anthropic
Image: Anthropic

Processing thousands of text inputs through Claude Sonnet gets expensive fast. A simple architectural pattern solves most of that: run everything through Haiku first, throw away the junk, and only send the good stuff to Sonnet.

The idea is straightforward. Haiku, Anthropic's smallest and cheapest Claude model, costs roughly 1/10th what Sonnet does per token. Most real-world text pipelines are full of noise - irrelevant comments, spam, off-topic content, duplicates. Haiku is more than capable of making a binary "worth analyzing or not" decision on each input. Only the inputs that pass that filter move on to Sonnet for the expensive, detailed processing.

One developer using this pattern for a tool that classifies thousands of user comments into structured data reports saving around 80% on API costs. The math checks out: if 80-90% of your input is noise (common for scraped comments, support tickets, form submissions, or social media data), you're paying Haiku prices to reject the garbage instead of Sonnet prices.

How to Set It Up

The implementation is a two-stage pipeline:

  1. Stage 1 (Haiku filter): Send each input to claude-haiku-4-5-20251001 with a simple prompt asking whether the text contains substantive, relevant content. You want a yes/no answer, maybe with a confidence score.
  2. Stage 2 (Sonnet processing): Only inputs that pass Stage 1 go to claude-sonnet-4-6 for full analysis, classification, extraction, or whatever your actual task is.

The key tradeoff is false negatives - Haiku occasionally rejecting something Sonnet would have found useful. In practice, for most filtering tasks ("is this comment substantive?" or "does this email contain an action item?"), Haiku's accuracy is high enough that the cost savings far outweigh the occasional miss.

This pattern works best when your input volume is high, your noise ratio is high, and your per-item Sonnet processing is token-heavy. It's less useful if you're sending short, curated inputs where most items need full processing anyway.

Anyone building Claude API pipelines that touch large datasets should consider this before optimizing anything else. It's the lowest-effort, highest-impact cost reduction available.