How to implement episodic memory for AI agents
Quick answer
Implement episodic memory for AI agents by storing past interactions as embeddings in a vector database and retrieving relevant memories to augment prompts for context-aware responses using
retrieval-augmented generation. Use embedding models to convert text into vectors and similarity search to recall relevant episodes dynamically.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 faiss-cpu numpy
Setup environment
Install necessary Python packages and set your OpenAI API key as an environment variable.
pip install openai faiss-cpu numpy Step by step implementation
This example demonstrates how to implement episodic memory by embedding user-agent interactions, storing them in a FAISS vector store, and retrieving relevant memories to augment the prompt for an AI agent using gpt-4o.
import os
import numpy as np
import faiss
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Function to get embeddings from OpenAI
# Using text-embedding-3-large for embeddings
def get_embedding(text):
response = client.embeddings.create(
model="text-embedding-3-large",
input=text
)
return np.array(response.data[0].embedding, dtype=np.float32)
# Initialize FAISS index for 1536-dim embeddings
embedding_dim = 1536
index = faiss.IndexFlatL2(embedding_dim)
# Store episodic memory as list of (text, vector) tuples
memory_texts = []
# Add episode to memory
def add_episode(text):
embedding = get_embedding(text)
index.add(np.array([embedding]))
memory_texts.append(text)
# Retrieve top k relevant episodes
def retrieve_episodes(query, k=3):
query_vec = get_embedding(query)
distances, indices = index.search(np.array([query_vec]), k)
results = []
for idx in indices[0]:
if idx < len(memory_texts):
results.append(memory_texts[idx])
return results
# Example usage
# Add some episodes
add_episode("User asked about AI agent design.")
add_episode("Agent explained reinforcement learning basics.")
add_episode("User requested code for episodic memory.")
# User query
query = "How do I implement memory in AI agents?"
# Retrieve relevant memories
relevant_memories = retrieve_episodes(query)
# Construct prompt with retrieved memories
prompt = "You are an AI assistant. Use the following past interactions to answer the question.\n" + "\n".join(relevant_memories) + f"\nQuestion: {query}\nAnswer:"
# Generate response with gpt-4o-mini
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content) output
User-friendly AI explanation about implementing episodic memory, referencing past interactions.
Common variations
- Use
asynccalls for embedding and completion APIs to improve throughput. - Swap
faisswith other vector stores likeChromaorWeaviatefor scalable memory. - Use different embedding models like
text-embedding-3-largeorgemini-1.5-flashdepending on provider. - Experiment with retrieval size
kto balance context length and relevance.
Troubleshooting tips
- If retrieval returns irrelevant memories, increase embedding quality or tune similarity threshold.
- If FAISS index throws dimension errors, verify embedding vector size matches index dimension.
- For slow responses, batch embedding requests or cache embeddings locally.
- Ensure environment variable
OPENAI_API_KEYis set correctly to avoid authentication errors.
Key Takeaways
- Use vector embeddings and similarity search to implement episodic memory for AI agents.
- Augment LLM prompts with retrieved past interactions for context-aware responses.
- Choose embedding models and vector stores based on your scale and latency needs.