How to beginner · 3 min read

How to summarize chat history for memory

Quick answer
Use a language model like gpt-4o to generate a concise summary of the chat history by sending the conversation messages as input and prompting the model to summarize. This summary can then be stored as memory for context in future interactions.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

  • Install package: pip install openai
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK to send the chat history as messages and request a summary from the gpt-4o model. Store the returned summary as memory for later use.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example chat history
chat_history = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm good, thanks! How can I help you today?"},
    {"role": "user", "content": "Can you summarize our conversation so far?"}
]

# Prepare prompt to summarize chat history
summary_prompt = [
    {"role": "system", "content": "You are a helpful assistant that summarizes chat history."}
] + chat_history + [
    {"role": "user", "content": "Please provide a concise summary of the above conversation."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=summary_prompt,
    max_tokens=150
)

summary = response.choices[0].message.content
print("Summary:", summary)
output
Summary: The user greeted the assistant and asked how it was doing. The assistant responded positively and offered help. The user then requested a concise summary of the conversation so far.

Common variations

You can use asynchronous calls with async and await for better performance in web apps. Different models like gpt-4o-mini can be used for faster, cheaper summaries. Streaming is less common for summaries but possible.

python
import os
import asyncio
from openai import OpenAI

async def summarize_chat():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    chat_history = [
        {"role": "user", "content": "Hello, how are you?"},
        {"role": "assistant", "content": "I'm good, thanks! How can I help you today?"},
        {"role": "user", "content": "Can you summarize our conversation so far?"}
    ]

    summary_prompt = [
        {"role": "system", "content": "You are a helpful assistant that summarizes chat history."}
    ] + chat_history + [
        {"role": "user", "content": "Please provide a concise summary of the above conversation."}
    ]

    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=summary_prompt,
        max_tokens=100
    )

    print("Summary:", response.choices[0].message.content)

asyncio.run(summarize_chat())
output
Summary: The user greeted the assistant and asked how it was doing. The assistant responded positively and offered help. The user requested a summary of the conversation.

Troubleshooting

  • If the summary is too long or incomplete, reduce max_tokens or add instructions to keep it concise.
  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • For rate limits, implement retries with exponential backoff.

Key Takeaways

  • Use gpt-4o or gpt-4o-mini to generate concise chat summaries for memory.
  • Send the full chat history as messages with a system prompt instructing summarization.
  • Store the summary as a memory snippet to provide context in future conversations.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗