How to beginner · 3 min read

How to maintain chat history with OpenAI

Q: How to maintain chat history with OpenAI

To maintain chat history with OpenAI, keep track of the conversation messages in a list and include the entire message history in each chat.completions.create call. This preserves context and enables the model to generate coherent responses based on prior exchanges.

Quick answer

To maintain chat history with OpenAI, keep track of the conversation messages in a list and include the entire message history in each chat.completions.create call. This preserves context and enables the model to generate coherent responses based on prior exchanges.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable.

bash

pip install openai>=1.0

Step by step

Maintain chat history by storing all messages in a list and passing them with each API call. This example shows a simple chat loop preserving conversation context.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

messages = [
    {"role": "system", "content": "You are a helpful assistant."}
]

while True:
    user_input = input("User: ")
    if user_input.lower() in ["exit", "quit"]:
        break
    messages.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages
    )

    assistant_message = response.choices[0].message.content
    print(f"Assistant: {assistant_message}")

    messages.append({"role": "assistant", "content": assistant_message})

output

User: Hello
Assistant: Hello! How can I assist you today?
User: What's the weather like?
Assistant: I don't have real-time weather data, but I can help you find a forecast online.

Common variations

Use different models like gpt-4o-mini for faster, cheaper responses.
Implement asynchronous calls with asyncio and await for concurrency.
Stream responses by setting stream=True in chat.completions.create to receive tokens incrementally.

python

import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [{"role": "system", "content": "You are a helpful assistant."}]
    user_input = "Tell me a joke."
    messages.append({"role": "user", "content": user_input})

    stream = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=messages,
        stream=True
    )

    print("Assistant: ", end="", flush=True)
    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)
    print()

output

Assistant: Why did the scarecrow win an award? Because he was outstanding in his field!

Troubleshooting

If the model forgets context, ensure you include the full message history in every request.
Trim or summarize long histories to stay within token limits.
Check your API key environment variable is set correctly to avoid authentication errors.

Key Takeaways

Always pass the full conversation message list to chat.completions.create to maintain context.
Use the system role message to set assistant behavior at the start of the conversation.
For long chats, manage token limits by trimming or summarizing history before sending.
Async and streaming calls improve responsiveness and scalability in chat applications.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.