High severity intermediate · Fix: 5-15 min

TokenLimitExceededError

langchain_core.exceptions.TokenLimitExceededError

What this error means

LangChain raises TokenLimitExceededError when the combined input and output tokens exceed the model's maximum context window size.

Stack trace

traceback

langchain_core.exceptions.TokenLimitExceededError: The total tokens in the prompt and expected completion exceed the model's maximum context window size of 8192 tokens.

QUICK FIX

Add token counting and input truncation logic before chain execution to ensure total tokens stay below the model's max context window.

Why it happens

This error occurs because the input prompt plus the expected output length surpass the model's maximum token context window. LangChain chains concatenate multiple inputs or memory states, causing token count overflow beyond the model's limit.

Detection

Monitor token usage by calculating prompt and expected output tokens before sending requests; log token counts and catch TokenLimitExceededError to identify overflows early.

Causes & fixes

Input documents or conversation history exceed the model's maximum token limit when combined.

✓ Fix

Implement token length checks and truncate or summarize inputs to fit within the model's context window before passing to the chain.

Chain memory accumulates too much context over multiple interactions, causing token overflow.

✓ Fix

Use memory management strategies like windowed memory or summarization to limit stored context size.

Prompt templates or chain outputs request too many tokens in the completion, exceeding limits.

✓ Fix

Reduce max_tokens parameter in the completion call and optimize prompt length to stay within limits.

Code: broken vs fixed

Broken - triggers the error

python

from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model_name='gpt-4o', max_tokens=4000)
chain = LLMChain(llm=llm, prompt=some_long_prompt_template)

# This line raises TokenLimitExceededError if prompt + output tokens exceed limit
result = chain.run(long_input_text)  # TokenLimitExceededError here

Fixed - works correctly

python

import os
from langchain.chains import LLMChain
from langchain_openai import ChatOpenAI
from langchain.schema import BasePromptTemplate

# Example token counting utility (pseudo)
def count_tokens(text):
    # Implement token counting logic here
    return len(text.split())  # Simplified example

llm = ChatOpenAI(model_name='gpt-4o', max_tokens=4000)

# Truncate input to fit token limit
max_context_tokens = 8192
max_output_tokens = 4000
input_tokens = count_tokens(long_input_text)
allowed_input_tokens = max_context_tokens - max_output_tokens
if input_tokens > allowed_input_tokens:
    truncated_input = ' '.join(long_input_text.split()[:allowed_input_tokens])
else:
    truncated_input = long_input_text

chain = LLMChain(llm=llm, prompt=some_long_prompt_template)
result = chain.run(truncated_input)  # Fixed: input truncated to fit context window
print(result)

Added token counting and input truncation to ensure total tokens (input + output) stay within the model's max context window, preventing TokenLimitExceededError.

⚠

Workaround

Catch TokenLimitExceededError and on exception, truncate or summarize the input text dynamically before retrying the chain call.

✓

Prevention

Design chains with token budget awareness by monitoring token counts continuously and using summarization or windowed memory to keep context within model limits.

Python 3.9+ · langchain-core >=0.1.0 · tested on 0.2.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.