How to test memory recall in agents
Quick answer
To test memory recall in agents, use a persistent memory store (like a list or vector DB) to save conversation history and pass it as context in
messages for each chat.completions.create call. Verify recall by querying the agent with prompts referencing prior interactions and checking if the agent responds consistently.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable.
- Install SDK:
pip install openai - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example demonstrates a simple memory recall test by storing conversation history in a list and passing it to the chat.completions.create method. The agent is prompted twice: first to introduce itself, then to recall its introduction.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Initialize conversation memory
memory = [
{"role": "system", "content": "You are a helpful assistant."}
]
# First user message
memory.append({"role": "user", "content": "Introduce yourself."})
response1 = client.chat.completions.create(
model="gpt-4o-mini",
messages=memory
)
print("Agent reply 1:", response1.choices[0].message.content)
# Append assistant reply to memory
memory.append({"role": "assistant", "content": response1.choices[0].message.content})
# Second user message asking agent to recall previous introduction
memory.append({"role": "user", "content": "What did you say you are?"})
response2 = client.chat.completions.create(
model="gpt-4o-mini",
messages=memory
)
print("Agent reply 2:", response2.choices[0].message.content) output
Agent reply 1: I am a helpful AI assistant here to assist you with your questions. Agent reply 2: I said I am a helpful AI assistant here to assist you with your questions.
Common variations
You can test memory recall with different approaches:
- Use vector databases (e.g., Pinecone, FAISS) to store and retrieve relevant past messages.
- Test with streaming responses by setting
stream=Trueinchat.completions.create. - Try different models like
gpt-4o-miniorclaude-3-5-sonnet-20241022for recall quality comparison. - Implement async calls using
asyncioand the OpenAI async client.
import asyncio
import os
from openai import OpenAI
async def test_memory_async():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
memory = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Say hello."}
]
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=memory
)
print("Async agent reply:", response.choices[0].message.content)
asyncio.run(test_memory_async()) output
Async agent reply: Hello! How can I assist you today?
Troubleshooting
If the agent does not recall previous messages correctly:
- Ensure the full conversation history is passed in the
messagesparameter each call. - Check token limits; truncate or summarize older messages if exceeding model context size.
- Verify you are using the correct model that supports memory context (e.g.,
gpt-4o-mini). - Confirm your API key is valid and environment variable is set.
Key Takeaways
- Use persistent conversation history in
messagesto test agent memory recall. - Appending both user and assistant messages ensures context continuity for accurate recall.
- Vector databases enable scalable memory recall beyond token limits.
- Test recall with different models and async or streaming calls for robustness.
- Monitor token usage to avoid context truncation affecting memory tests.