How to summarize chat history for memory
Quick answer
Use a language model like
gpt-4o to generate a concise summary of the chat history by sending the conversation messages as input and prompting the model to summarize. This summary can then be stored as memory for context in future interactions.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable.
- Install package:
pip install openai - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Use the OpenAI SDK to send the chat history as messages and request a summary from the gpt-4o model. Store the returned summary as memory for later use.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Example chat history
chat_history = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm good, thanks! How can I help you today?"},
{"role": "user", "content": "Can you summarize our conversation so far?"}
]
# Prepare prompt to summarize chat history
summary_prompt = [
{"role": "system", "content": "You are a helpful assistant that summarizes chat history."}
] + chat_history + [
{"role": "user", "content": "Please provide a concise summary of the above conversation."}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=summary_prompt,
max_tokens=150
)
summary = response.choices[0].message.content
print("Summary:", summary) output
Summary: The user greeted the assistant and asked how it was doing. The assistant responded positively and offered help. The user then requested a concise summary of the conversation so far.
Common variations
You can use asynchronous calls with async and await for better performance in web apps. Different models like gpt-4o-mini can be used for faster, cheaper summaries. Streaming is less common for summaries but possible.
import os
import asyncio
from openai import OpenAI
async def summarize_chat():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
chat_history = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm good, thanks! How can I help you today?"},
{"role": "user", "content": "Can you summarize our conversation so far?"}
]
summary_prompt = [
{"role": "system", "content": "You are a helpful assistant that summarizes chat history."}
] + chat_history + [
{"role": "user", "content": "Please provide a concise summary of the above conversation."}
]
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=summary_prompt,
max_tokens=100
)
print("Summary:", response.choices[0].message.content)
asyncio.run(summarize_chat()) output
Summary: The user greeted the assistant and asked how it was doing. The assistant responded positively and offered help. The user requested a summary of the conversation.
Troubleshooting
- If the summary is too long or incomplete, reduce
max_tokensor add instructions to keep it concise. - If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - For rate limits, implement retries with exponential backoff.
Key Takeaways
- Use
gpt-4oorgpt-4o-minito generate concise chat summaries for memory. - Send the full chat history as messages with a system prompt instructing summarization.
- Store the summary as a memory snippet to provide context in future conversations.