ContextWindowExceededError
langchain_core.exceptions.ContextWindowExceededError
Stack trace
langchain_core.exceptions.ContextWindowExceededError: Input tokens exceed the model's maximum context window size of 8192 tokens.
Why it happens
This error occurs because the combined tokens of the prompt, context, and conversation history exceed the maximum token limit supported by the LLM model. LangChain enforces this limit to prevent API errors and ensure model compatibility.
Detection
Monitor token counts before sending requests by using LangChain's token counting utilities or assert input length against the model's max tokens to catch this error early.
Causes & fixes
Prompt or context text is too large and exceeds the model's token limit.
Reduce the input size by truncating or summarizing context, or split input into smaller chunks before passing to the model.
Conversation history accumulates too many tokens over multiple turns.
Implement a sliding window or token budget strategy to keep conversation history within the token limit.
Using a model with a smaller context window than expected.
Switch to a model with a larger context window, such as gpt-4o or llama-3.3-70b, that supports more tokens.
Code: broken vs fixed
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name='gpt-4o-mini')
# This input is too large and triggers ContextWindowExceededError
response = llm.call("""Very long input text exceeding token limit...""") # Error here import os
from langchain_openai import ChatOpenAI
os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', '') # Use environment variable for API key
llm = ChatOpenAI(model_name='gpt-4o-mini')
# Truncate or chunk input to fit token limit
input_text = "Very long input text exceeding token limit..."
max_tokens = 8000 # slightly less than model max to allow response tokens
if len(input_text) > max_tokens:
input_text = input_text[:max_tokens] # simple truncation
response = llm.call(input_text) # Fixed: input fits token limit
print(response) Workaround
Catch ContextWindowExceededError and programmatically truncate or split the input text into smaller chunks, then call the model multiple times and aggregate results.
Prevention
Use LangChain's token counting utilities to monitor input size dynamically and implement chunking or summarization pipelines to keep inputs within the model's context window before sending requests.