How to intermediate · 3 min read

How to use prompt caching with Claude API

Quick answer
Use prompt caching with the Anthropic API by storing previous prompt-response pairs locally or in a cache layer and reusing them for identical prompts. The Anthropic SDK itself does not provide built-in caching, so implement caching logic in your application to avoid redundant API calls and improve response speed.

PREREQUISITES

  • Python 3.8+
  • Anthropic API key
  • pip install anthropic>=0.20

Setup

Install the anthropic Python SDK and set your API key as an environment variable.

bash
pip install anthropic>=0.20

Step by step

This example demonstrates a simple in-memory prompt cache to avoid repeated calls for the same prompt when using the Anthropic SDK.

python
import os
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

# Simple in-memory cache dictionary
prompt_cache = {}

def get_completion_with_cache(prompt: str) -> str:
    if prompt in prompt_cache:
        print("Using cached response")
        return prompt_cache[prompt]

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        system="You are a helpful assistant.",
        messages=[{"role": "user", "content": prompt}]
    )
    completion = response.content[0].text
    prompt_cache[prompt] = completion
    return completion

# Example usage
prompt_text = "Explain the benefits of prompt caching."
result = get_completion_with_cache(prompt_text)
print(result)

# Calling again to demonstrate caching
result_cached = get_completion_with_cache(prompt_text)
print(result_cached)
output
Explain the benefits of prompt caching.

Using cached response
Explain the benefits of prompt caching.

Common variations

  • Use persistent caching with Redis or a database for multi-instance apps.
  • Implement async caching with asyncio if using an async client.
  • Cache partial prompt completions or embeddings depending on use case.

Troubleshooting

  • If cached responses seem outdated, implement cache invalidation strategies based on time or version.
  • Ensure your cache keys uniquely identify prompts including system instructions to avoid incorrect reuse.
  • Watch for memory bloat in in-memory caches; use size limits or eviction policies.

Key Takeaways

  • Implement prompt caching in your application to reduce redundant calls to the Claude API and improve latency.
  • Use unique keys combining system and user prompts to avoid incorrect cache hits.
  • Consider persistent caches like Redis for scalable multi-instance deployments.
Verified 2026-04 · claude-3-5-sonnet-20241022
Verify ↗