ContextWindowExceededError
llama_index.errors.ContextWindowExceededError
Stack trace
llama_index.errors.ContextWindowExceededError: Input text chunk size exceeds the model's maximum context window size of 4096 tokens.
File "main.py", line 42, in build_index
index = GPTVectorStoreIndex.from_documents(documents)
File "llama_index/indices/vector_store.py", line 123, in from_documents
raise ContextWindowExceededError("Input chunk too large for model context window.") Why it happens
LlamaIndex splits input documents into chunks to fit within the model's context window. If a chunk is larger than the model's maximum token limit, this error is raised. This usually happens when chunk size parameters are set too high or documents contain very large unchunked sections.
Detection
Monitor chunk sizes before indexing by logging token counts per chunk. Catch ContextWindowExceededError exceptions during index creation or query time to identify oversized chunks.
Causes & fixes
Chunk size parameter is set larger than the model's maximum context window size.
Reduce the chunk size parameter in the LlamaIndex text splitter configuration to be smaller than the model's max context window (e.g., 2048 tokens for GPT-4o-mini).
Input documents contain very large unchunked sections or no effective chunking applied.
Use a text splitter (e.g., TokenTextSplitter) to break documents into smaller chunks before indexing.
Using a model with a smaller context window than expected without adjusting chunking accordingly.
Verify the model's max context window size and configure chunking parameters to fit within that limit.
Code: broken vs fixed
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
# No chunking specified, default chunk size too large
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents) # Raises ContextWindowExceededError here import os
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, TokenTextSplitter
os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY') # Use env var for API key
# Use TokenTextSplitter with chunk_size smaller than model context window
text_splitter = TokenTextSplitter(chunk_size=2048, chunk_overlap=100)
documents = SimpleDirectoryReader('data').load_data()
# Split documents into smaller chunks
split_docs = []
for doc in documents:
split_docs.extend(text_splitter.split_text(doc.text))
index = GPTVectorStoreIndex.from_documents(split_docs) # Fixed: chunk size fits context window
print("Index built successfully with chunking") Workaround
Catch ContextWindowExceededError and manually split the offending document text into smaller chunks using a simple substring or regex splitter before retrying indexing.
Prevention
Always configure text chunking parameters based on the model's documented max context window size and validate chunk token counts before indexing or querying.