Skip to content
Spekir

Model routing: why choosing Haiku, Sonnet, or Opus matters more than your prompt

80% of AI cost reduction comes from sending the right request to the right model — not from prompt engineering. A practical guide to model routing in production.

Founder, Spekir7 min read
Jump to section...

Most teams that start with AI in production make one decision early: they pick one model and send everything there. It is understandable. It is simple to implement. And it is the direct path to high API bills and unnecessary latency.

The problem is not that Claude Opus is too expensive for everything. The problem is that Claude Haiku is too cheap not to use for simple tasks. And the line between "simple" and "complex" is exactly what most teams have not drawn.

This article is about drawing that line.

What it costs to route incorrectly

Start with numbers. Anthropic's public pricing (as of April 2026) shows roughly this pattern:

Claude Haiku: Cheapest. Fastest. Best for classification, simple extractions, formatting, routing decisions, and anything that does not require deep reasoning.

Claude Sonnet: Middle tier. Balances capability and cost. Good for most reasoning tasks, code explanations, context-aware summaries, and user-facing responses.

Claude Opus: Most expensive. Most capable. Reserved for tasks requiring multi-step reasoning, complex strategy analysis, advanced code generation, and nuanced interpretation of ambiguous input.

The price difference between Haiku and Opus is typically 15-20x per token. That means one Opus call for a task Haiku would handle correctly costs the same as 15-20 Haiku calls. In a system with hundreds of daily AI calls, that is the difference between a manageable API bill and one that surprises the CFO.

80% of AI cost savings in production come from correct model routing — not from prompt optimisation, caching, or batching. It is the highest-ROI action per engineering hour invested.

Then there is latency. Haiku typically responds in under a second. Opus can take 5-10 seconds on complex tasks. For user-facing features, that difference is noticeable.

Haiku: classification and simple extractions

Haiku is the most underestimated model in most teams' setup. It is built for tasks that are clearly defined, input-structured, and output-predictable.

Concrete use cases Haiku handles well:

Sentiment classification: "Is this customer review positive, negative, or neutral?" A question with three answers. Haiku handles it in under 500 tokens and under a second.

Entity extraction: "Extract the company name, contact person, and invoice amount from this email." Structured output from unstructured text. Haiku is fast and accurate when input patterns are relatively consistent.

Routing decisions: "Should this request go to technical support, sales, or billing?" Classification with a limited set of categories. Haiku as a gateway model reduces Opus usage by 60-80% in systems that need to distinguish between request types.

Formatting and transformation: JSON-to-markdown, markdown-to-HTML, structuring output from another model, normalising date formats.

Simple validation: "Is this a valid tax ID? Is this address formatted correctly?"

Haiku is not well-suited for tasks that require holding many conflicting pieces of information in working memory, understanding subtle context, or producing long, nuanced prose. It hallucinates more on ambiguous inputs.

Sonnet: mainstream reasoning

Sonnet is the workhorse. It handles the majority of user-facing AI calls in a typical enterprise system.

Code explanation and review: "What does this function do, and are there potential bugs?" Sonnet handles this effectively up to 2-3 class dependencies deep.

Customer query responses: Support automation with access to company-specific context. Sonnet can maintain thread continuity in a multi-turn conversation and produce responses that sound professional without overthinking simple content.

Context-aware summaries: Summarising documents, meeting notes, reports. Sonnet performs well when output length is defined and independent evaluation of conflicting facts is not required.

Strategic analysis, first pass: First drafts of analyses that are subsequently quality-checked. Sonnet as the draft model, Opus (or humans) as the reviewer.

Content generation: Product descriptions, marketing copy, email drafts. Sonnet is good at following tone-of-voice guidelines and producing professional output quickly.

The critical limitation: Sonnet can hit its ceiling on ambiguous, politically complex, or knowledge-intensive tasks. Give it a task requiring interpretation of contrast across fifteen pages of input and evaluation of the implications — that is Opus territory.

Opus: complex analysis and nuanced interpretation

Opus is the model you use when it costs something to be wrong, and average output is not good enough.

Strategy parsing: Analysis of a 60-page PowerPoint deck containing a company's strategy and extraction of Playing-to-Win elements, strategic themes, and initiatives — with correct handling of conflicting signals and management-speak.

Complex code generation: Generating architecturally correct code that respects existing patterns and dependencies from a specification with incomplete information.

Legal and compliance analysis: Interpreting regulatory text in a specific context, identifying edge cases, and generating recommendations held against specific criteria.

Complex problem diagnosis: AI-assisted debugging in multi-layer systems with conflicting signals — Opus maintains the reasoning thread better and produces more reliable root-cause analyses.

Nuanced content evaluation: Evaluating whether a piece of content meets many simultaneous criteria (tone, factual accuracy, brand compliance, legal constraints) and producing structured feedback.

Do not use Opus as your default model. Use it as a specialist.

The routeModel() pattern in practice

The technical implementation is simple: a routing function that selects a model based on a task complexity parameter.

type Complexity = "simple" | "standard" | "complex"

function routeModel(complexity: Complexity) {
  switch (complexity) {
    case "simple":   return models.haiku
    case "standard": return models.fast   // Sonnet
    case "complex":  return models.deep   // Opus
  }
}

The critical step is defining complexity against specific criteria, not intuition. A pragmatic approach:

Simple: Output is classification, extraction, or transformation. Input is structured or semi-structured. Haiku error rate < 2% in evaluation.

Standard: Output requires reasoning or prose generation with context. Input may be unstructured. Task is well-defined but non-trivial.

Complex: Output requires multi-step reasoning, interpretation of conflicting information, or high nuance. Sonnet error rate > 5% in evaluation. Latency is acceptable.

The most important principle: routing decisions should be based on evaluation, not intuition. Run 50-100 cases through Haiku and Sonnet, measure output quality, and set the threshold where quality degradation becomes noticeable.

What causes routing to fail

Three mistakes are more common than others:

Static routing: All calls to one feature use the same model because "it was simplest to implement." Over time this means either overkilling simple tasks (expensive) or underpowering complex ones (bad output).

Routing based on feature type, not input characteristics: "Strategy analyses always use Opus" is a poor signal. A simple strategic update does not need Opus. A deep parsing of an ambiguous document does. Routing should respond to the concrete input, not the feature category.

No evaluation: Teams implement routing without measuring whether Haiku actually handles the tasks they route to it. Haiku hallucinates on ambiguous inputs and underperforms on context-sensitive tasks. Evaluate.

What to do tomorrow

Model routing is the highest-ROI action in most AI systems in production. Three steps to get started:

Week 1: List all AI calls in the product. Categorise them as simple, standard, or complex against the criteria above. Identify calls currently using Opus that could plausibly be handled by Sonnet or Haiku.

Week 2: Implement the routeModel() function. Run 50-100 cases from Opus calls through Sonnet and measure output quality. Set the threshold based on the measurement.

Week 3: Evaluate Sonnet calls that might be routable to Haiku. Measure. Decide.

Start with routing. Optimise prompts afterwards.


References

[1] Anthropic, "Claude Model Overview and Pricing", available at docs.anthropic.com/en/docs/models-overview (accessed 2026-04-23).

ShareLinkedInX

Spekir builds the layer that connects strategy to the IT portfolio. See Atlas →

Related articles