How to beginner · 3 min read

How to cache AI API responses

Quick answer
To cache AI API responses, store the input prompt and corresponding output in a local or distributed cache like Redis or a file-based store. On subsequent requests, check the cache first to return the saved response, avoiding repeated API calls and reducing latency and cost.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install redis (optional for Redis caching)

Setup

Install the required Python packages and set your OpenAI API key as an environment variable.

bash
pip install openai redis

Step by step

This example demonstrates caching AI API responses in a simple Python dictionary for demonstration. For production, use Redis or a persistent store.

python
import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Simple in-memory cache dictionary
cache = {}

def get_cached_response(prompt: str) -> str:
    if prompt in cache:
        print("Cache hit")
        return cache[prompt]
    print("Cache miss, calling API")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.choices[0].message.content
    cache[prompt] = text
    return text

# Example usage
if __name__ == "__main__":
    prompt = "Explain caching AI API responses in simple terms."
    print(get_cached_response(prompt))
    # Second call returns cached response
    print(get_cached_response(prompt))
output
Cache miss, calling API
Explain caching AI API responses in simple terms...
Cache hit
Explain caching AI API responses in simple terms...

Common variations

Use Redis for distributed caching to share cache across multiple app instances. Use async calls for better throughput. Change models by updating the model parameter.

python
import os
import redis
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Connect to Redis (make sure Redis server is running locally or remotely)
r = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)


def get_cached_response_redis(prompt: str) -> str:
    cached = r.get(prompt)
    if cached:
        print("Redis cache hit")
        return cached
    print("Redis cache miss, calling API")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.choices[0].message.content
    r.set(prompt, text)
    return text

# Async example requires async client and event loop (not shown here)

if __name__ == "__main__":
    prompt = "What is AI caching?"
    print(get_cached_response_redis(prompt))
output
Redis cache miss, calling API
AI caching is the process of storing AI API responses...

Troubleshooting

  • If you see stale or outdated responses, implement cache expiration (TTL) in Redis or your cache store.
  • If cache grows too large, use eviction policies or limit cache size.
  • For inconsistent cache hits, verify keys are normalized (e.g., strip whitespace, consistent casing).

Key Takeaways

  • Cache AI API responses by storing prompt-response pairs to reduce latency and cost.
  • Use Redis or similar distributed caches for scalable, persistent caching.
  • Normalize prompts before caching to avoid duplicate entries.
  • Implement cache expiration to keep responses fresh and manage storage.
  • Async and streaming calls require adapted caching logic but follow the same principles.
Verified 2026-04 · gpt-4o
Verify ↗