ConversationHistoryTooLongError
openai.error.ConversationHistoryTooLongError
Stack trace
openai.error.ConversationHistoryTooLongError: The total tokens in the conversation history exceed the model's maximum context window size and cannot be processed.
Why it happens
LLMs have a fixed maximum context window size (token limit). When the accumulated conversation history plus the new prompt exceed this limit, the API rejects the request or truncates the input, causing loss of earlier context and errors.
Detection
Monitor token usage of conversation history before sending requests; log token counts and catch ConversationHistoryTooLongError exceptions to detect when context exceeds limits.
Causes & fixes
Accumulated conversation history tokens exceed the model's maximum context window size.
Implement a sliding window or summary strategy to truncate or compress older messages before sending the request.
Including large system or user messages repeatedly without pruning.
Remove or shorten redundant or less relevant messages from the conversation history before each API call.
Using a model with a smaller context window than required for the conversation length.
Switch to a model with a larger context window, such as gpt-4o or gemini-2.5-pro, to accommodate longer histories.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
messages = [
{"role": "system", "content": "You are a helpful assistant."},
# ... very long conversation history ...
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
) # This line triggers ConversationHistoryTooLongError
print(response.choices[0].message.content) from openai import OpenAI
import os
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
messages = [
{"role": "system", "content": "You are a helpful assistant."},
# ... very long conversation history ...
]
# Truncate conversation history to last N messages to fit context window
MAX_TOKENS = 2048 # example limit for gpt-4o-mini
def count_tokens(messages):
# Simplified token count approximation
return sum(len(m['content'].split()) for m in messages) * 1.5
while count_tokens(messages) > MAX_TOKENS:
messages.pop(1) # remove oldest user/assistant message, keep system prompt
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
) # Fixed: truncated history to fit context window
print(response.choices[0].message.content) Workaround
Catch ConversationHistoryTooLongError and on exception, programmatically remove oldest messages or summarize them before retrying the request.
Prevention
Design conversation management to track token usage and proactively truncate or summarize history to stay within model context limits, or use models with larger context windows.