Critical severity intermediate · Fix: 15-30 min

MemoryError

builtins.MemoryError

What this error means

The AI pipeline consumes increasing memory over time during long-running execution, eventually causing a MemoryError and crashing the process.

Stack trace

traceback

Traceback (most recent call last):
  File "pipeline.py", line 142, in run_pipeline
    result = llm_chain.invoke(inputs)
  File "/usr/local/lib/python3.10/site-packages/langchain/chains/base.py", line 123, in invoke
    output = self._call(inputs)
  File "/usr/local/lib/python3.10/site-packages/langchain/chains/llm.py", line 78, in _call
    response = self.llm.generate(prompts)
  File "/usr/local/lib/python3.10/site-packages/langchain/llms/openai.py", line 45, in generate
    completions = self.client.chat.completions.create(
  File "/usr/local/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 35, in create
    raise MemoryError("Memory exhausted during LLM call")
MemoryError: Memory exhausted during LLM call

QUICK FIX

Add explicit cleanup of large objects and call gc.collect() periodically in your pipeline loop to prevent memory buildup.

Why it happens

Long-running AI pipelines often accumulate references to large objects such as LLM responses, embeddings, or cached data without proper cleanup. This causes Python's memory usage to grow continuously until the system runs out of memory and raises a MemoryError. Inefficient data structures or missing garbage collection triggers exacerbate this.

Detection

Monitor your pipeline's memory usage over time using tools like psutil or memory_profiler. Set alerts for abnormal memory growth and log memory stats before and after major pipeline steps to catch leaks early.

Causes & fixes

Retaining references to all LLM responses and intermediate data in memory without cleanup

✓ Fix

Explicitly delete or clear large data structures after use and avoid storing unnecessary intermediate results in long-lived variables.

Using global or static caches that grow unbounded during pipeline execution

✓ Fix

Implement cache size limits or use weak references to allow garbage collection of unused cached items.

Not invoking garbage collection in long-running loops where many temporary objects are created

✓ Fix

Manually call gc.collect() periodically in the pipeline to free unreferenced memory.

Loading large models or embeddings repeatedly without reusing or unloading them properly

✓ Fix

Load models and embeddings once and reuse them, or unload them explicitly if no longer needed.

Code: broken vs fixed

Broken - triggers the error

python

from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

responses = []
for prompt in prompts:
    response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
    responses.append(response)  # Memory grows unbounded here

print("All responses received")

Fixed - works correctly

python

import os
import gc
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

responses = []
for i, prompt in enumerate(prompts):
    response = client.chat.completions.create(model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}])
    responses.append(response)
    if i % 10 == 0:  # Periodically clean up
        del responses[:-10]  # Keep only last 10 responses
        gc.collect()  # Force garbage collection

print("All responses received with memory managed")

Added periodic deletion of old responses and manual garbage collection calls to prevent memory buildup during long loops.

⚠

Workaround

Wrap the pipeline loop with try/except MemoryError, and on exception, save intermediate state to disk and restart the process to free memory.

✓

Prevention

Design pipelines to process data in streaming or batch chunks with explicit resource cleanup, avoid global caches without limits, and monitor memory usage continuously in production.

Python 3.9+ · openai >=1.0.0 · tested on 1.5.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.