MemoryContextWindowExceededError
ai_memory.errors.MemoryContextWindowExceededError
Stack trace
ai_memory.errors.MemoryContextWindowExceededError: Context window size exceeded when loading conversation history. Max tokens allowed: 4096, tokens requested: 5120
File "/app/ai_memory/session.py", line 78, in load_history
raise MemoryContextWindowExceededError("Context window size exceeded when loading conversation history.") Why it happens
AI models have a fixed maximum context window size limiting how many tokens can be processed at once. When the stored conversation history plus the new input exceed this limit, the memory system cannot load all history, triggering this error. This usually happens when too much history is retained without pruning or summarization.
Detection
Monitor token counts of conversation history before sending to the model. Log or assert if the combined token length exceeds the model's max context window to catch this error before it crashes the app.
Causes & fixes
Conversation history grows without pruning or summarization, exceeding the model's max token limit.
Implement history pruning or summarization to keep the token count within the model's context window limit.
Using a model with a smaller context window than the amount of stored history requires.
Switch to a model with a larger context window size that can accommodate more tokens.
Not counting tokens accurately before sending history to the model, leading to oversize requests.
Use a reliable tokenizer to count tokens and truncate or summarize history accordingly before sending.
Code: broken vs fixed
from ai_memory import MemorySession
session = MemorySession(model_name="gpt-4o-mini")
session.load_history() # Raises MemoryContextWindowExceededError here import os
from ai_memory import MemorySession
os.environ["AI_MEMORY_API_KEY"] = os.environ.get("AI_MEMORY_API_KEY", "your_api_key_here")
session = MemorySession(model_name="gpt-4o-mini")
session.prune_history(max_tokens=3500) # Prune history to fit context window
session.load_history() # Now works without error
print("History loaded successfully") Workaround
Catch MemoryContextWindowExceededError and manually truncate or summarize the oldest parts of the history before retrying the load.
Prevention
Design memory management to track token usage and automatically prune or summarize history to stay within the model's context window limits, or use models with larger context windows.