How to beginner · 3 min read

How to cache extraction results

Quick answer
To cache extraction results, store the AI response output locally or in a fast-access database keyed by the input query or document ID. Use Python caching libraries like functools.lru_cache for in-memory caching or external stores like Redis for persistent caching to avoid redundant API calls.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install redis (optional for Redis caching)

Setup

Install the required packages and set your environment variable for the OpenAI API key. Optionally, install Redis for persistent caching.

  • Install OpenAI SDK: pip install openai
  • Install Redis client (optional): pip install redis
  • Set environment variable: export OPENAI_API_KEY='your_api_key'
bash
pip install openai redis
output
Requirement already satisfied: openai in ...
Requirement already satisfied: redis in ...

Step by step

This example demonstrates caching extraction results in-memory using functools.lru_cache and calling the OpenAI gpt-4o-mini model for text extraction. The cache key is the input text. If the input is cached, the cached result is returned instead of calling the API again.

python
import os
from openai import OpenAI
from functools import lru_cache

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@lru_cache(maxsize=128)
def extract_text_cached(input_text: str) -> str:
    messages = [{"role": "user", "content": f"Extract key information from this text:\n{input_text}"}]
    response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

if __name__ == "__main__":
    sample_text = "John is 30 years old and lives in New York."
    result1 = extract_text_cached(sample_text)
    print("First call result:", result1)
    # Second call uses cache
    result2 = extract_text_cached(sample_text)
    print("Second call (cached) result:", result2)
output
First call result: John is 30 years old and lives in New York.
Second call (cached) result: John is 30 years old and lives in New York.

Common variations

For persistent caching across sessions, use Redis. Here is a simple example storing extraction results keyed by input text hash. This avoids repeated API calls even after program restarts.

python
import os
import hashlib
import json
from openai import OpenAI
import redis

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
redis_client = redis.Redis(host='localhost', port=6379, db=0)


def get_cache_key(text: str) -> str:
    return hashlib.sha256(text.encode('utf-8')).hexdigest()


def extract_text_redis(input_text: str) -> str:
    key = get_cache_key(input_text)
    cached = redis_client.get(key)
    if cached:
        return cached.decode('utf-8')

    messages = [{"role": "user", "content": f"Extract key information from this text:\n{input_text}"}]
    response = client.chat.completions.create(model="gpt-4o-mini", messages=messages)
    result = response.choices[0].message.content
    redis_client.set(key, result)
    return result


if __name__ == "__main__":
    sample_text = "Alice bought 3 books and paid $45."
    print("Extraction result:", extract_text_redis(sample_text))
output
Extraction result: Alice bought 3 books and paid $45.

Troubleshooting

  • If caching does not seem to work, ensure the cache key uniquely represents the input text and that the cache store (memory or Redis) is accessible.
  • For Redis, verify the Redis server is running locally or update connection parameters accordingly.
  • Be mindful of cache size limits; lru_cache has a max size, and Redis may need eviction policies.
  • If extraction results change over time, consider cache expiration strategies.

Key Takeaways

  • Use functools.lru_cache for simple in-memory caching of extraction results keyed by input.
  • For persistent caching, use Redis with hashed keys to store and retrieve extraction outputs.
  • Always generate a consistent cache key from the input to avoid cache misses.
  • Implement cache expiration or invalidation if extraction results may change over time.
Verified 2026-04 · gpt-4o-mini
Verify ↗