How to beginner · 3 min read

How to test memory recall in agents

Quick answer

To test memory recall in agents, use a persistent memory store (like a list or vector DB) to save conversation history and pass it as context in messages for each chat.completions.create call. Verify recall by querying the agent with prompts referencing prior interactions and checking if the agent responds consistently.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example demonstrates a simple memory recall test by storing conversation history in a list and passing it to the chat.completions.create method. The agent is prompted twice: first to introduce itself, then to recall its introduction.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Initialize conversation memory
memory = [
    {"role": "system", "content": "You are a helpful assistant."}
]

# First user message
memory.append({"role": "user", "content": "Introduce yourself."})
response1 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=memory
)
print("Agent reply 1:", response1.choices[0].message.content)

# Append assistant reply to memory
memory.append({"role": "assistant", "content": response1.choices[0].message.content})

# Second user message asking agent to recall previous introduction
memory.append({"role": "user", "content": "What did you say you are?"})
response2 = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=memory
)
print("Agent reply 2:", response2.choices[0].message.content)

output

Agent reply 1: I am a helpful AI assistant here to assist you with your questions.
Agent reply 2: I said I am a helpful AI assistant here to assist you with your questions.

Common variations

You can test memory recall with different approaches:

Use vector databases (e.g., Pinecone, FAISS) to store and retrieve relevant past messages.
Test with streaming responses by setting stream=True in chat.completions.create.
Try different models like gpt-4o-mini or claude-3-5-sonnet-20241022 for recall quality comparison.
Implement async calls using asyncio and the OpenAI async client.

python

import asyncio
import os
from openai import OpenAI

async def test_memory_async():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    memory = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Say hello."}
    ]
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=memory
    )
    print("Async agent reply:", response.choices[0].message.content)

asyncio.run(test_memory_async())

output

Async agent reply: Hello! How can I assist you today?

Troubleshooting

If the agent does not recall previous messages correctly:

Ensure the full conversation history is passed in the messages parameter each call.
Check token limits; truncate or summarize older messages if exceeding model context size.
Verify you are using the correct model that supports memory context (e.g., gpt-4o-mini).
Confirm your API key is valid and environment variable is set.

✅

Key Takeaways

Use persistent conversation history in messages to test agent memory recall.
Appending both user and assistant messages ensures context continuity for accurate recall.
Vector databases enable scalable memory recall beyond token limits.
Test recall with different models and async or streaming calls for robustness.
Monitor token usage to avoid context truncation affecting memory tests.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗