How to Intermediate · 4 min read

How to implement episodic memory for AI agents

Quick answer
Implement episodic memory for AI agents by storing past interactions as embeddings in a vector database and retrieving relevant memories to augment prompts for context-aware responses using retrieval-augmented generation. Use embedding models to convert text into vectors and similarity search to recall relevant episodes dynamically.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 faiss-cpu numpy

Setup environment

Install necessary Python packages and set your OpenAI API key as an environment variable.

bash
pip install openai faiss-cpu numpy

Step by step implementation

This example demonstrates how to implement episodic memory by embedding user-agent interactions, storing them in a FAISS vector store, and retrieving relevant memories to augment the prompt for an AI agent using gpt-4o.

python
import os
import numpy as np
import faiss
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to get embeddings from OpenAI
# Using text-embedding-3-large for embeddings

def get_embedding(text):
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=text
    )
    return np.array(response.data[0].embedding, dtype=np.float32)

# Initialize FAISS index for 1536-dim embeddings
embedding_dim = 1536
index = faiss.IndexFlatL2(embedding_dim)

# Store episodic memory as list of (text, vector) tuples
memory_texts = []

# Add episode to memory

def add_episode(text):
    embedding = get_embedding(text)
    index.add(np.array([embedding]))
    memory_texts.append(text)

# Retrieve top k relevant episodes

def retrieve_episodes(query, k=3):
    query_vec = get_embedding(query)
    distances, indices = index.search(np.array([query_vec]), k)
    results = []
    for idx in indices[0]:
        if idx < len(memory_texts):
            results.append(memory_texts[idx])
    return results

# Example usage

# Add some episodes
add_episode("User asked about AI agent design.")
add_episode("Agent explained reinforcement learning basics.")
add_episode("User requested code for episodic memory.")

# User query
query = "How do I implement memory in AI agents?"

# Retrieve relevant memories
relevant_memories = retrieve_episodes(query)

# Construct prompt with retrieved memories
prompt = "You are an AI assistant. Use the following past interactions to answer the question.\n" + "\n".join(relevant_memories) + f"\nQuestion: {query}\nAnswer:"

# Generate response with gpt-4o-mini
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

print(response.choices[0].message.content)
output
User-friendly AI explanation about implementing episodic memory, referencing past interactions.

Common variations

  • Use async calls for embedding and completion APIs to improve throughput.
  • Swap faiss with other vector stores like Chroma or Weaviate for scalable memory.
  • Use different embedding models like text-embedding-3-large or gemini-1.5-flash depending on provider.
  • Experiment with retrieval size k to balance context length and relevance.

Troubleshooting tips

  • If retrieval returns irrelevant memories, increase embedding quality or tune similarity threshold.
  • If FAISS index throws dimension errors, verify embedding vector size matches index dimension.
  • For slow responses, batch embedding requests or cache embeddings locally.
  • Ensure environment variable OPENAI_API_KEY is set correctly to avoid authentication errors.

Key Takeaways

  • Use vector embeddings and similarity search to implement episodic memory for AI agents.
  • Augment LLM prompts with retrieved past interactions for context-aware responses.
  • Choose embedding models and vector stores based on your scale and latency needs.
Verified 2026-04 · gpt-4o-mini, text-embedding-3-large
Verify ↗