API Advanced medium · 6 min

Haiku for classification and routing tasks

What you will learn

Use Claude Haiku as a lightweight classifier and request router to reduce latency and cost on high-volume categorization tasks.

Why this matters

Production systems often need to route requests to different downstream handlers or classify user input before expensive operations. Haiku processes these decisions 3-5x faster and costs 90% less than Opus, making it ideal for the classification layer in multi-model pipelines.

Skip if: Don't use Haiku for classification if your task requires nuanced reasoning, multi-step logic, or reasoning across 10K+ token contexts. Switch to Sonnet or Opus when classification accuracy on edge cases matters more than speed. Don't use Haiku for generation tasks requiring factual consistency or long-form synthesis.

Explanation

What Haiku does: Claude Haiku is Anthropic's fastest model, optimized for token-efficient tasks like categorization, routing, and structured extraction. It accepts the same API surface as Sonnet and Opus but processes requests in ~100ms and costs $0.80 per million input tokens (vs $3 for Sonnet).

How it works: Haiku uses a smaller, pruned version of Claude's architecture trained specifically for speed-critical tasks. When you specify model="claude-3-5-haiku-20241022", the Anthropic API routes your request to inference infrastructure optimized for low-latency batch processing. The model trades some reasoning depth for speed: perfect for classification, routing, and structured output generation where the decision space is bounded.

When to use: Deploy Haiku in request-filtering layers (spam detection, content moderation), multi-model routers ("route this to search vs summarization"), or classification pipelines processing 1000+ requests per minute. Pair Haiku with Sonnet/Opus using a two-stage pattern: Haiku classifies or routes, then the full model handles complex cases.

Request code

python

import anthropic
import json

client = anthropic.Anthropic()

def classify_support_ticket(ticket_text: str) -> dict:
    """Route support tickets to category using Haiku."""
    message = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=100,
        system="You are a support ticket classifier. Respond with only valid JSON (no markdown). Categories: billing, technical, product_feedback, account.",
        messages=[
            {
                "role": "user",
                "content": f"Classify this ticket:\n\n{ticket_text}"
            }
        ]
    )
    
    response_text = message.content[0].text
    try:
        classification = json.loads(response_text)
    except json.JSONDecodeError:
        classification = {"category": "unknown", "confidence": 0.0}
    
    return {
        "category": classification.get("category"),
        "confidence": classification.get("confidence", 0.5),
        "input_tokens": message.usage.input_tokens,
        "output_tokens": message.usage.output_tokens
    }

ticket = "I was charged twice this month. My invoice shows two $99 charges on the same day."
result = classify_support_ticket(ticket)
print(json.dumps(result, indent=2))

Authentication

Set your Anthropic API key as an environment variable before running any code: export ANTHROPIC_API_KEY="sk-ant-...". The Anthropic SDK reads this automatically when instantiating the client. No manual header construction needed.

Response shape

Field	Description
`category`	string: one of: billing, technical, product_feedback, account
`confidence`	float: model's confidence in classification (0.0-1.0)
`input_tokens`	integer: tokens consumed by the prompt
`output_tokens`	integer: tokens in the model's response

Field guide

content[0].text

The raw text response from Haiku: parse this to extract structured classification

usage.input_tokens

Track this carefully: Haiku's cost advantage vanishes if your prompt is 5K tokens. Use sparse prompts or batch similar classifications together

usage.output_tokens

Constrain max_tokens to 100-200 for classification: Haiku will be verbose if allowed, increasing cost

Setup trap

Many developers test Haiku with verbose prompts (1-2K tokens of examples) and conclude it's slow. Haiku's speed advantage only shows with lean prompts (<500 tokens). The default system prompt adds overhead: write minimal instructions: "Classify as X, Y, or Z. Respond with JSON." Testing with these short prompts reveals Haiku's true 100-150ms latency.

Cost

At scale, Haiku saves dramatically: 1M classification requests at $0.80 per 1M input tokens = $0.80 if each input averages 1000 tokens. Sonnet (at $3/1M) costs $3 for the same work. The $2.20 per million tickets compounds fast: a system routing 100K tickets/day saves ~$7K per month by using Haiku.

Rate limits

Haiku has the same rate limits as other models on your plan (5K requests/min, 40K tokens/min for free tier). However, because Haiku is token-efficient, high-volume classification workloads rarely hit token limits: you'll hit request-count limits first. If you do, batch your requests into 20-100 ticket classifications per API call using prompt batching.

Common gotcha

Haiku sometimes outputs filler text before JSON ("Here's the classification: {json}") when you ask for structured output. The code above handles JSONDecodeError, but the production fix is to use a stricter system prompt: "Respond with only a valid JSON object, no other text." and set max_tokens=150 so the model can't ramble.

Error recovery

json.JSONDecodeError

Haiku output text that isn't valid JSON: usually extra narrative before the JSON object. Fix: Update system prompt to enforce JSON-only output, or parse out the JSON substring using regex before json.loads(). Never retry the request; the model is working as designed.

RateLimitError (429)

You've exceeded your requests-per-minute limit. Implement exponential backoff: wait 2s, then 4s, then 8s before retrying. For production, queue tickets in a local buffer and batch them.

AuthenticationError (401)

ANTHROPIC_API_KEY is missing or invalid. Verify: `echo $ANTHROPIC_API_KEY` shows a key starting with 'sk-ant-'. If empty, the SDK won't fail at instantiation: it fails at the first API call.

Experienced dev note

The real win isn't Haiku's speed in isolation: it's the two-tier architecture. Send 95% of requests (easy cases) to Haiku (100ms, $0.80/1M tokens), escalate 5% that Haiku flags as uncertain to Sonnet. You'll spend $0.20 on Haiku and $0.75 on Sonnet for a batch of 10K, vs $30 if you sent everything to Sonnet. Measure uncertainty via response format: if Haiku returns 'confidence: 0.65', route to Sonnet. This costs nothing extra and improves accuracy.

Check your understanding

Why does max_tokens=100 for Haiku classification improve both cost and response quality? What happens if you set max_tokens=2000 and ask for JSON output?

Show answer hint

Haiku will fill unused token budget with verbose reasoning or repeated JSON. Lower max_tokens forces precision. Also, the cost saved by Haiku shrinks if you allow it to output 2K tokens instead of 100.

VERSION Use model ID 'claude-3-5-haiku-20241022' (April 2026 stable). Avoid 'claude-3-haiku-20240307': it has lower accuracy. The 3.5 variant improved classification F1 score by 12 points across intent detection benchmarks.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.