How to beginner · 3 min read

How to summarize conversation history

Q: How to summarize conversation history

Use a chat model like gpt-4o to summarize conversation history by sending the full message list as input and prompting the model to generate a concise summary. This is done by passing the conversation messages to client.chat.completions.create() with a user prompt requesting a summary.

Quick answer

Use a chat model like gpt-4o to summarize conversation history by sending the full message list as input and prompting the model to generate a concise summary. This is done by passing the conversation messages to client.chat.completions.create() with a user prompt requesting a summary.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

This example shows how to summarize a conversation history by sending the full chat messages to the gpt-4o model with a prompt to summarize.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

conversation_history = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi, can you help me with my project?"},
    {"role": "assistant", "content": "Sure! What do you need help with?"},
    {"role": "user", "content": "I want to summarize our chat so far."}
]

# Add a user message prompting for a summary
messages = conversation_history + [
    {"role": "user", "content": "Please provide a concise summary of our conversation so far."}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

summary = response.choices[0].message.content
print("Summary:\n", summary)

output

Summary:
 You asked for help with your project and want to summarize the chat so far. I am here to assist you.

Common variations

Use gpt-4o-mini for faster, cheaper summaries with slightly less detail.
Use async calls with asyncio and await client.chat.completions.create(...) for non-blocking applications.
Stream the summary tokens by setting stream=True to display partial results as they arrive.

python

import asyncio
from openai import OpenAI

async def async_summarize():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    conversation_history = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me about AI."}
    ]
    messages = conversation_history + [
        {"role": "user", "content": "Summarize this conversation."}
    ]
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_summarize())

output

You asked about AI and requested a summary of the conversation.

Troubleshooting

If the summary is too long, reduce the conversation history or use a model with a smaller context window.
If you get an authentication error, verify your OPENAI_API_KEY environment variable is set correctly.
If the model returns irrelevant summaries, clarify the prompt to explicitly ask for a concise summary.

✅

Key Takeaways

Use the full conversation messages as input to the chat completion API to summarize history.
Add a user prompt explicitly requesting a concise summary for best results.
Streaming and async calls improve responsiveness in real-time applications.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗