High severity intermediate · Fix: 5-10 min

ValueError

ValueError in torch.utils.data.DataLoader multiprocessing

What this error means
This ValueError occurs when the PyTorch DataLoader's num_workers parameter is set improperly, causing multiprocessing conflicts or resource exhaustion during data loading in HuggingFace pipelines.

Stack trace

traceback
ValueError: num_workers > 0 not supported on this platform or with this multiprocessing start method
  File "/usr/local/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 123, in __init__
    raise ValueError("num_workers > 0 not supported on this platform or with this multiprocessing start method")
QUICK FIX
Set num_workers=0 in DataLoader or add 'if __name__ == "__main__":' guard and set multiprocessing start method to 'spawn' explicitly.

Why it happens

PyTorch DataLoader uses multiprocessing to load data in parallel when num_workers > 0. On some platforms (like Windows or certain environments), or when the multiprocessing start method is incompatible, setting num_workers > 0 triggers this ValueError. This is often due to the default 'spawn' start method or platform restrictions.

Detection

Check for ValueError exceptions when initializing DataLoader with num_workers > 0, especially on Windows or restricted environments like Jupyter notebooks or certain Docker containers.

Causes & fixes

1

Running DataLoader with num_workers > 0 on Windows without setting multiprocessing start method

✓ Fix

Add 'if __name__ == "__main__":' guard and set multiprocessing start method to 'spawn' explicitly before DataLoader creation.

2

Using num_workers > 0 in an environment that does not support multiprocessing (e.g., some Jupyter notebooks or restricted containers)

✓ Fix

Set num_workers=0 to disable multiprocessing data loading in unsupported environments.

3

Incompatible or missing __main__ guard in scripts using multiprocessing DataLoader

✓ Fix

Wrap DataLoader code inside 'if __name__ == "__main__":' block to ensure safe multiprocessing spawn.

Code: broken vs fixed

Broken - triggers the error
python
from torch.utils.data import DataLoader
from datasets import load_dataset

dataset = load_dataset('imdb', split='train')
dataloader = DataLoader(dataset, batch_size=8, num_workers=4)  # ValueError here
for batch in dataloader:
    print(batch)
Fixed - works correctly
python
import os
import multiprocessing
from torch.utils.data import DataLoader
from datasets import load_dataset

if __name__ == "__main__":
    multiprocessing.set_start_method('spawn', force=True)  # Fix: set start method
    dataset = load_dataset('imdb', split='train')
    dataloader = DataLoader(dataset, batch_size=8, num_workers=4)  # Fixed
    for batch in dataloader:
        print(batch)
Added '__main__' guard and explicitly set multiprocessing start method to 'spawn' to fix ValueError with num_workers > 0 on Windows and similar platforms.

Workaround

If you cannot modify the multiprocessing start method or add a main guard, set num_workers=0 to disable multiprocessing and avoid the ValueError temporarily.

Prevention

Always use 'if __name__ == "__main__":' guard in scripts using DataLoader with num_workers > 0 and explicitly set multiprocessing start method to 'spawn' on Windows or restricted environments to prevent this error.

Python 3.7+ · torch >=1.0.0 · tested on 2.0.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.