How to beginner · 3 min read

How to cache AI API responses

Quick answer

To cache AI API responses, store the input prompt and corresponding output in a local or distributed cache like Redis or a file-based store. On subsequent requests, check the cache first to return the saved response, avoiding repeated API calls and reducing latency and cost.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install redis (optional for Redis caching)

Setup

Install the required Python packages and set your OpenAI API key as an environment variable.

bash

pip install openai redis

Step by step

This example demonstrates caching AI API responses in a simple Python dictionary for demonstration. For production, use Redis or a persistent store.

python

import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Simple in-memory cache dictionary
cache = {}

def get_cached_response(prompt: str) -> str:
    if prompt in cache:
        print("Cache hit")
        return cache[prompt]
    print("Cache miss, calling API")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.choices[0].message.content
    cache[prompt] = text
    return text

# Example usage
if __name__ == "__main__":
    prompt = "Explain caching AI API responses in simple terms."
    print(get_cached_response(prompt))
    # Second call returns cached response
    print(get_cached_response(prompt))

output

Cache miss, calling API
Explain caching AI API responses in simple terms...
Cache hit
Explain caching AI API responses in simple terms...

Common variations

Use Redis for distributed caching to share cache across multiple app instances. Use async calls for better throughput. Change models by updating the model parameter.

python

import os
import redis
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Connect to Redis (make sure Redis server is running locally or remotely)
r = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)


def get_cached_response_redis(prompt: str) -> str:
    cached = r.get(prompt)
    if cached:
        print("Redis cache hit")
        return cached
    print("Redis cache miss, calling API")
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    text = response.choices[0].message.content
    r.set(prompt, text)
    return text

# Async example requires async client and event loop (not shown here)

if __name__ == "__main__":
    prompt = "What is AI caching?"
    print(get_cached_response_redis(prompt))

output

Redis cache miss, calling API
AI caching is the process of storing AI API responses...

Troubleshooting

If you see stale or outdated responses, implement cache expiration (TTL) in Redis or your cache store.
If cache grows too large, use eviction policies or limit cache size.
For inconsistent cache hits, verify keys are normalized (e.g., strip whitespace, consistent casing).

✅

Key Takeaways

Cache AI API responses by storing prompt-response pairs to reduce latency and cost.
Use Redis or similar distributed caches for scalable, persistent caching.
Normalize prompts before caching to avoid duplicate entries.
Implement cache expiration to keep responses fresh and manage storage.
Async and streaming calls require adapted caching logic but follow the same principles.

Verified 2026-04 · gpt-4o

Verify ↗