High severity intermediate · Fix: 5-15 min

RuntimeError

torch.utils.data.dataloader.RuntimeError

What this error means
PyTorch DataLoader worker process crashed unexpectedly, causing the main training loop to fail.

Stack trace

traceback
RuntimeError: DataLoader worker (pid 12345) exited unexpectedly
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 123, in _worker_loop
    data = fetcher.fetch()
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 50, in fetch
    raise RuntimeError('DataLoader worker exited unexpectedly')
QUICK FIX
Set num_workers=0 in DataLoader to disable multiprocessing and isolate the error source quickly.

Why it happens

This error occurs when a DataLoader worker process crashes due to an unhandled exception, such as an error in dataset __getitem__, corrupted data, or issues with multiprocessing like deadlocks or resource limits. The main process detects the worker exit and raises this RuntimeError.

Detection

Monitor DataLoader worker logs or wrap dataset __getitem__ calls with try/except to catch exceptions early. Use logging inside workers to detect failures before the main process crashes.

Causes & fixes

1

Exception raised inside dataset __getitem__ method (e.g., index out of range or corrupted data)

✓ Fix

Add error handling and validation inside __getitem__ to ensure valid indices and data integrity.

2

Using incompatible multiprocessing start method or shared memory limits causing worker crashes

✓ Fix

Set multiprocessing start method to 'spawn' via torch.multiprocessing.set_start_method('spawn') and increase shared memory limits if needed.

3

Dataset or transform code is not picklable or uses non-thread-safe operations

✓ Fix

Ensure dataset and transform objects are picklable and avoid global state or non-thread-safe code in workers.

4

Insufficient system resources (memory, file descriptors) causing worker process to be killed

✓ Fix

Reduce num_workers, increase system resource limits, or optimize dataset loading to reduce memory usage.

Code: broken vs fixed

Broken - triggers the error
python
from torch.utils.data import DataLoader

dataloader = DataLoader(dataset, batch_size=32, num_workers=4)
for batch in dataloader:
    # RuntimeError: DataLoader worker exited unexpectedly
    process(batch)
Fixed - works correctly
python
import os
import torch
from torch.utils.data import DataLoader

torch.multiprocessing.set_start_method('spawn', force=True)  # Changed start method to spawn

dataloader = DataLoader(dataset, batch_size=32, num_workers=4)
for batch in dataloader:
    process(batch)  # Worker crash fixed by start method change
Changed multiprocessing start method to 'spawn' to avoid worker crashes caused by 'fork' method incompatibilities.

Workaround

Temporarily set num_workers=0 to disable multiprocessing in DataLoader, allowing debugging of dataset code without worker crashes.

Prevention

Use 'spawn' start method for multiprocessing, validate dataset __getitem__ thoroughly, and monitor system resources to prevent worker crashes.

Python 3.7+ · torch >=1.0.0 · tested on 2.0.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.