Installing PyTorch: CPU vs CUDA version
Why this matters
Choosing the wrong installation wastes weeks of training time: a model that takes 2 hours on GPU might take 2 days on CPU. You need to know what hardware you have and install the matching version.
Explanation
What it is: PyTorch is a machine learning library that can run computations on your CPU (processor) or GPU (graphics card). CUDA is NVIDIA's parallel computing platform that lets PyTorch offload heavy math to your GPU. How it works mechanically: When you install PyTorch with CUDA, it bundles GPU drivers and libraries that let tensors live in GPU memory. When you call .to('cuda') on a tensor, it moves from CPU RAM to GPU VRAM. Operations on GPU tensors run in parallel across thousands of cores instead of a few CPU cores. When to use: Install CUDA if you have an NVIDIA GPU and plan to train neural networks. Install CPU-only if you have no GPU, or if you're just learning and want faster installation. You can always reinstall later.
Analogy
Installing PyTorch is like choosing between a bicycle (CPU: cheap, works everywhere, slow) and a sports car with a turbocharged engine (GPU: expensive, needs special fuel, extremely fast). You need the right fuel to run the engine; the wrong fuel does nothing.
Code
import torch
import subprocess
import sys
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA device: {torch.cuda.get_device_name(0)}")
print(f"CUDA capability: {torch.cuda.get_device_capability(0)}")
device = torch.device('cuda')
else:
print("No CUDA GPU found. Using CPU.")
device = torch.device('cpu')
tensor_cpu = torch.randn(2, 3)
tensor_on_device = tensor_cpu.to(device)
print(f"\nTensor on {device}:")
print(tensor_on_device)
print(f"Device: {tensor_on_device.device}") PyTorch version: 2.11.0
CUDA available: True
CUDA device: NVIDIA A100
CUDA capability: (8, 0)
Tensor on cuda:0:
tensor([[ 0.1234, 0.5678, -0.9012],
[-0.3456, 0.7890, 0.1234]], device='cuda:0')
Device: cuda:0 What just happened?
The code checked if CUDA is installed and an NVIDIA GPU is available. It printed the PyTorch version and GPU name. Then it created a tensor on CPU and moved it to GPU using .to(device). The last print confirms the tensor is now on the cuda:0 device (the first GPU).
Common gotcha
Installing PyTorch with CUDA doesn't guarantee your GPU will be used. Even after installation, if you forget to call .to('cuda') or .to(device), your tensors stay on CPU and run slowly without any warning or error. Many developers waste hours debugging slow training before realizing they're running on CPU by accident.
Error recovery
CUDA out of memoryCUDA not available (but you expected it)ImportError: cannot import name '_C'Experienced dev note
Senior developers always run `torch.cuda.is_available()` in their setup verification code, not just in their head. It should be part of your training script's startup: log it, assert it if required, or gracefully fall back to CPU. Many production issues stem from assuming CUDA is available on a different machine. Also: CUDA 11.8 and CUDA 12.1 wheels are incompatible: if your team uses CUDA 12.1 but you install CUDA 11.8, you'll get silent failures. Always pin PyTorch version in requirements.txt with the CUDA version in a comment.
Check your understanding
You have an NVIDIA RTX 4090 GPU installed. You run the code above and torch.cuda.is_available() returns False. What are two possible root causes, and how would you distinguish between them?
Show answer hint
A correct answer identifies: (1) NVIDIA driver not installed or outdated: check with nvidia-smi. (2) PyTorch installed without CUDA support: check torch.__version__ to see if it says 'cu11' or 'cu12' or just 'cpu'. The distinction: nvidia-smi works only if driver is installed; torch.cuda.is_available() works only if PyTorch was installed with CUDA.