Code Advanced medium · 6 min

Anthropic Claude integration

What you will learn

Replace OpenAI with Anthropic Claude as your LLM backbone in LlamaIndex using the modern Settings API.

Why this matters

Claude offers different reasoning strengths, cost profiles, and token limits than GPT-4: you need to swap LLMs without rewriting your entire indexing pipeline. Understanding how to configure LLMs at the Settings layer prevents vendor lock-in and lets you A/B test models in production.

Skip if: Don't use Claude if your application requires real-time streaming with sub-100ms latency, or if you need function calling that's tightly coupled to OpenAI's JSON schema format. Also skip Claude if your queries are under 500 tokens and GPT-4-turbo cost-per-token is already lower.

Explanation

Anthropic Claude in LlamaIndex is a drop-in LLM replacement that plugs into the Settings configuration object. Instead of hardcoding OpenAI, you instantiate Anthropic(model='claude-3-5-sonnet-20241022') and assign it globally: every index, retriever, and query engine inherits that choice without modification.

Mechanically, LlamaIndex's Settings object maintains a singleton LLM reference. When you call index.as_query_engine(), it reads Settings.llm and uses whatever model you configured. Claude's API is compatible with LlamaIndex's BaseLLM interface, so all downstream logic (response synthesis, node ranking, structured output) works identically.

Use Claude when you need extended context windows (200K tokens), strong instruction-following for complex prompts, or when Anthropic's pricing undercuts GPT-4 for your token volume. The integration is production-safe because Anthropic's Python SDK and LlamaIndex's Claude wrapper handle rate limiting and error retry automatically.

Analogy

Switching LLMs via Settings is like swapping a database backend in an ORM: you change one config line and the entire application uses the new engine without touching query logic.

Code

python

import os
from anthropic import Anthropic
from llama_index.core import (
    Settings,
    VectorStoreIndex,
    SimpleDirectoryReader,
)
from llama_index.llms.anthropic import Anthropic as AnthropicLLM
from llama_index.embeddings.openai import OpenAIEmbedding

os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

Settings.llm = AnthropicLLM(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    temperature=0.7,
)

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

documents = SimpleDirectoryReader("./sample_docs").load_data()

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query("What are the key findings in these documents?")

print(f"LLM Model: {Settings.llm.model}")
print(f"Response: {response}")

Output

LLM Model: claude-3-5-sonnet-20241022
Response: <response text synthesized by Claude>

What just happened?

The code imported AnthropicLLM from the llama_index.anthropic module, instantiated it with the Claude model ID and configuration parameters, and assigned it to the global Settings singleton. When SimpleDirectoryReader loaded documents and VectorStoreIndex built the index, the embedding model stayed as OpenAI (text-embedding-3-small) because embeddings and LLMs are separate concerns. When as_query_engine() created the retriever, it automatically used Claude (from Settings.llm) for response synthesis instead of OpenAI. The query executed against Claude's API via the Anthropic SDK, which LlamaIndex wraps.

Common gotcha

Forgetting that Settings.llm is a global singleton: if you create multiple indexes or query engines in the same session, they all share the same LLM instance. If you swap Settings.llm = AnthropicLLM(...) partway through, all subsequent queries use Claude, including older indexes created before the swap. This isn't a bug, but it's a footgun in notebooks or long-running services where you test multiple models sequentially and accidentally use the wrong one.

Error recovery

AuthenticationError

Your ANTHROPIC_API_KEY environment variable is missing or invalid. Set it with os.environ['ANTHROPIC_API_KEY'] = 'sk-ant-...' or export ANTHROPIC_API_KEY=your_key before instantiating AnthropicLLM.

ImportError: cannot import name 'Anthropic' from 'llama_index.llms.anthropic'

You're using an old llama-index version that doesn't have the Anthropic integration. Upgrade with pip install --upgrade llama-index-llms-anthropic.

RateLimitError

Anthropic rate limits were exceeded. Implement exponential backoff in your query loop or check your Anthropic account quotas. LlamaIndex does not auto-retry rate limits for Anthropic: you must handle it at the application level.

ValidationError: model 'invalid-model-name'

The model string you passed to AnthropicLLM() is not a valid Claude model ID. Use 'claude-3-5-sonnet-20241022', 'claude-3-opus-20250219', or check Anthropic's API documentation for current model names.

Experienced dev note

Claude's context window is 200K tokens, but that doesn't mean your retriever automatically gets 200K: LlamaIndex still obeys your chunk size and similarity_top_k settings. A senior developer would size chunks and top_k based on their actual query complexity, not Claude's raw capacity. Also, Claude's vision capabilities (multi-modal) are not exposed through LlamaIndex's standard LLM interface as of 0.12.x: if you need Claude's image understanding, you'll write custom code or wait for a vision-enabled wrapper.

Check your understanding

If you instantiate AnthropicLLM with model='claude-3-5-sonnet-20241022' and assign it to Settings.llm, then create two VectorStoreIndex instances from different document sets, and then swap Settings.llm to a different Claude model before querying both indexes: which Claude model will each index use, and why?

Show answer hint

A correct answer recognizes that Settings.llm is a global singleton reference, so both indexes will use whichever LLM is currently assigned to Settings.llm at query time (the second model), not the model that existed when each index was created. The index itself does not store an LLM reference: it looks up Settings.llm at query execution, not at index creation.

VERSION In llama-index-core < 0.11.0, Claude integration required manual API wrappers. Starting in 0.11.0, Anthropic support is native via llama_index.llms.anthropic. If you're on 0.10.x, this code will fail: upgrade with pip install --upgrade llama-index-core.

Learn how to compare response quality and latency across Claude and GPT-4 using LlamaIndex's evaluation framework to pick the right model for your use case.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.