ValueError
ValueError in torch.utils.data.DataLoader multiprocessing
Stack trace
ValueError: num_workers > 0 not supported on this platform or with this multiprocessing start method
File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 123, in __init__
raise ValueError("num_workers > 0 not supported on this platform or with this multiprocessing start method") Why it happens
PyTorch DataLoader uses multiprocessing to load data in parallel when num_workers > 0. On some platforms (like Windows or certain environments), or when the multiprocessing start method is incompatible, setting num_workers > 0 triggers this ValueError. This is often due to the default 'spawn' start method or platform restrictions.
Detection
Check for ValueError exceptions when initializing DataLoader with num_workers > 0, especially on Windows or restricted environments like Jupyter notebooks or certain Docker containers.
Causes & fixes
Running DataLoader with num_workers > 0 on Windows without setting multiprocessing start method
Add 'if __name__ == "__main__":' guard and set multiprocessing start method to 'spawn' explicitly before DataLoader creation.
Using num_workers > 0 in an environment that does not support multiprocessing (e.g., some Jupyter notebooks or restricted containers)
Set num_workers=0 to disable multiprocessing data loading in unsupported environments.
Incompatible or missing __main__ guard in scripts using multiprocessing DataLoader
Wrap DataLoader code inside 'if __name__ == "__main__":' block to ensure safe multiprocessing spawn.
Code: broken vs fixed
from torch.utils.data import DataLoader
from datasets import load_dataset
dataset = load_dataset('imdb', split='train')
dataloader = DataLoader(dataset, batch_size=8, num_workers=4) # ValueError here
for batch in dataloader:
print(batch) import os
import multiprocessing
from torch.utils.data import DataLoader
from datasets import load_dataset
if __name__ == "__main__":
multiprocessing.set_start_method('spawn', force=True) # Fix: set start method
dataset = load_dataset('imdb', split='train')
dataloader = DataLoader(dataset, batch_size=8, num_workers=4) # Fixed
for batch in dataloader:
print(batch) Workaround
If you cannot modify the multiprocessing start method or add a main guard, set num_workers=0 to disable multiprocessing and avoid the ValueError temporarily.
Prevention
Always use 'if __name__ == "__main__":' guard in scripts using DataLoader with num_workers > 0 and explicitly set multiprocessing start method to 'spawn' on Windows or restricted environments to prevent this error.