RuntimeError
torch.utils.data.dataloader.RuntimeError
Stack trace
RuntimeError: DataLoader worker (pid 12345) exited unexpectedly
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 123, in _worker_loop
data = fetcher.fetch()
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 50, in fetch
raise RuntimeError('DataLoader worker exited unexpectedly') Why it happens
This error occurs when a DataLoader worker process crashes due to an unhandled exception, such as an error in dataset __getitem__, corrupted data, or issues with multiprocessing like deadlocks or resource limits. The main process detects the worker exit and raises this RuntimeError.
Detection
Monitor DataLoader worker logs or wrap dataset __getitem__ calls with try/except to catch exceptions early. Use logging inside workers to detect failures before the main process crashes.
Causes & fixes
Exception raised inside dataset __getitem__ method (e.g., index out of range or corrupted data)
Add error handling and validation inside __getitem__ to ensure valid indices and data integrity.
Using incompatible multiprocessing start method or shared memory limits causing worker crashes
Set multiprocessing start method to 'spawn' via torch.multiprocessing.set_start_method('spawn') and increase shared memory limits if needed.
Dataset or transform code is not picklable or uses non-thread-safe operations
Ensure dataset and transform objects are picklable and avoid global state or non-thread-safe code in workers.
Insufficient system resources (memory, file descriptors) causing worker process to be killed
Reduce num_workers, increase system resource limits, or optimize dataset loading to reduce memory usage.
Code: broken vs fixed
from torch.utils.data import DataLoader
dataloader = DataLoader(dataset, batch_size=32, num_workers=4)
for batch in dataloader:
# RuntimeError: DataLoader worker exited unexpectedly
process(batch) import os
import torch
from torch.utils.data import DataLoader
torch.multiprocessing.set_start_method('spawn', force=True) # Changed start method to spawn
dataloader = DataLoader(dataset, batch_size=32, num_workers=4)
for batch in dataloader:
process(batch) # Worker crash fixed by start method change Workaround
Temporarily set num_workers=0 to disable multiprocessing in DataLoader, allowing debugging of dataset code without worker crashes.
Prevention
Use 'spawn' start method for multiprocessing, validate dataset __getitem__ thoroughly, and monitor system resources to prevent worker crashes.