MemoryError
builtins.MemoryError
Stack trace
Traceback (most recent call last):
File "pipeline.py", line 142, in run_pipeline
result = llm_chain.invoke(inputs)
File "/usr/local/lib/python3.10/site-packages/langchain/chains/base.py", line 123, in invoke
output = self._call(inputs)
File "/usr/local/lib/python3.10/site-packages/langchain/chains/llm.py", line 78, in _call
response = self.llm.generate(prompts)
File "/usr/local/lib/python3.10/site-packages/langchain/llms/openai.py", line 45, in generate
completions = self.client.chat.completions.create(
File "/usr/local/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 35, in create
raise MemoryError("Memory exhausted during LLM call")
MemoryError: Memory exhausted during LLM call Why it happens
Long-running AI pipelines often accumulate references to large objects such as LLM responses, embeddings, or cached data without proper cleanup. This causes Python's memory usage to grow continuously until the system runs out of memory and raises a MemoryError. Inefficient data structures or missing garbage collection triggers exacerbate this.
Detection
Monitor your pipeline's memory usage over time using tools like psutil or memory_profiler. Set alerts for abnormal memory growth and log memory stats before and after major pipeline steps to catch leaks early.
Causes & fixes
Retaining references to all LLM responses and intermediate data in memory without cleanup
Explicitly delete or clear large data structures after use and avoid storing unnecessary intermediate results in long-lived variables.
Using global or static caches that grow unbounded during pipeline execution
Implement cache size limits or use weak references to allow garbage collection of unused cached items.
Not invoking garbage collection in long-running loops where many temporary objects are created
Manually call gc.collect() periodically in the pipeline to free unreferenced memory.
Loading large models or embeddings repeatedly without reusing or unloading them properly
Load models and embeddings once and reuse them, or unload them explicitly if no longer needed.
Code: broken vs fixed
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
responses = []
for prompt in prompts:
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
responses.append(response) # Memory grows unbounded here
print("All responses received") import os
import gc
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
responses = []
for i, prompt in enumerate(prompts):
response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
responses.append(response)
if i % 10 == 0: # Periodically clean up
del responses[:-10] # Keep only last 10 responses
gc.collect() # Force garbage collection
print("All responses received with memory managed") Workaround
Wrap the pipeline loop with try/except MemoryError, and on exception, save intermediate state to disk and restart the process to free memory.
Prevention
Design pipelines to process data in streaming or batch chunks with explicit resource cleanup, avoid global caches without limits, and monitor memory usage continuously in production.