API Beginner easy · 5 min

Multi-turn conversation: building message history

What you will learn

Maintain conversation context by accumulating user and assistant messages in a list before each API call.

Why this matters

Single-turn requests don't understand previous context. Real chatbots require you to send the entire conversation history so the model can reference what was said before and give coherent replies.

Skip if: Use single-turn calls (no history) only for one-off questions where context doesn't matter: like asking for a recipe or a quick definition. For anything requiring reasoning across multiple exchanges, always build history.

Explanation

The OpenAI Chat API is stateless: it has no memory between calls. Each call to client.chat.completions.create() only sees the messages you send in that exact request. To simulate a conversation, you must manually accumulate all previous messages (user and assistant) and append each new user message before sending.

Under the hood, the API processes your entire message list as context, then generates the next token conditioned on everything that came before. The model doesn't store state on the server; you're responsible for keeping the list up-to-date on your side. Each message is a dict with role ('user', 'assistant', or 'system') and content (the text).

Use this pattern whenever you need multi-turn dialogue: chatbots, debugging assistants, interview simulators, or any interaction spanning more than one exchange. Start with a system message (optional but recommended), then append user/assistant pairs as the conversation flows.

Request code

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"},
]

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=messages,
    temperature=0.7
)

assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}")

messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "What is its population?"})

response2 = client.chat.completions.create(
    model="gpt-4.1",
    messages=messages,
    temperature=0.7
)

assistant_reply2 = response2.choices[0].message.content
print(f"Assistant: {assistant_reply2}")

messages.append({"role": "assistant", "content": assistant_reply2})
messages.append({"role": "user", "content": "And which river runs through it?"})

response3 = client.chat.completions.create(
    model="gpt-4.1",
    messages=messages,
    temperature=0.7
)

print(f"Assistant: {response3.choices[0].message.content}")

Authentication

Set your OpenAI API key as an environment variable before running: export OPENAI_API_KEY='sk-...'. The OpenAI() constructor reads this automatically. Alternatively, pass it explicitly: OpenAI(api_key='sk-...').

Response shape

Field	Description
`id`	chatcmpl-8zA...
`object`	chat.completion
`created`	1699564200
`model`	gpt-4.1
`choices`	[object Object]
`usage`	[object Object]

Field guide

choices[0].message.content

The text of the assistant's reply: this is what you show to the user and append to your message history

usage.prompt_tokens

Tokens in your input (all messages sent). Each message costs tokens, so longer histories cost more.

usage.completion_tokens

Tokens the model generated. Completion tokens often cost more per token than prompt tokens.

finish_reason

Why the model stopped. 'stop' means natural completion; 'length' means it hit max_tokens and was cut off (incomplete thought).

Setup trap

The environment variable must be set *before* you instantiate OpenAI(). If you set os.environ['OPENAI_API_KEY'] after creating the client, it won't work: the SDK reads the key at init time, not at request time. Always export or set the env var first, or pass the key directly to the constructor.

Cost

Every API call sends the entire message history. A 10-turn conversation sends 10× more tokens in prompts than a single turn. At $5 per 1M input tokens (GPT-4), a 20-message history costs roughly 10× more than one message alone. For long conversations (100+ messages), consider summarizing old messages or moving to a persistent storage backend to avoid exponential cost growth.

Rate limits

If you're calling the API rapidly in a loop (e.g., simulating a conversation programmatically), you may hit rate limits (429 errors). Add exponential backoff retry logic. The free tier has strict limits (~3 req/min); paid tiers are much higher.

Common gotcha

Forgetting to append the assistant's response to messages before the next call. You must add {"role": "assistant", "content": response.choices[0].message.content} to the list, or the model will lose the context of its own previous answer and repeat itself or contradict earlier statements.

Error recovery

AuthenticationError

Your API key is missing, expired, or invalid. Check <code>echo $OPENAI_API_KEY</code> and verify it starts with 'sk-'. Regenerate it at platform.openai.com if expired.

RateLimitError

You've exceeded your quota or made requests too fast. Wait a few seconds and retry. Use exponential backoff: <code>time.sleep(2 ** attempt)</code>.

InvalidRequestError (message role unknown)

You used a role other than 'user', 'assistant', or 'system'. Check your dict keys: <code>role</code> must be exactly one of these three strings.

APIConnectionError

Network issue or OpenAI service down. Verify internet connectivity and retry. Check status.openai.com.

Experienced dev note

Message history is your responsibility. In production, store conversations in a database or cache (Redis) keyed by session ID or user ID, not in memory. Memory leaks happen when you keep growing a single messages list without cleanup. Also: the order of messages matters for the model's reasoning: always preserve chronological order, and prepend system messages. One subtle win: if a conversation gets long (>2000 tokens), summarize the oldest exchanges and replace them with a summary message before the current exchange: this saves tokens and keeps reasoning sharp.

Check your understanding

Your multi-turn chatbot is working well, but after 50 exchanges, responses get slower and users report they're paying for a feature they shouldn't. What's happening, and how would you fix it without rebuilding the backend?

Show answer hint

Each API call is sending the entire 50-message history as input. The prompt grows with every exchange. You need to either truncate old messages (keep only the last N), summarize old messages into a single 'summary' message, or implement a sliding window.

VERSION This pattern works in openai 1.x (current). In older versions (0.x), the API was openai.ChatCompletion.create() without a client object: if you see that in old code, update to the client-based pattern shown here.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.