CUDA version matching: the critical step
Why this matters
A single version mismatch means your code runs on CPU instead of GPU without throwing an error: your training becomes 50-100x slower and you won't notice until production. Developers often debug model logic when the real problem is silent CPU execution.
Explanation
What it is: CUDA is NVIDIA's parallel computing platform. PyTorch compiled for CUDA 12.1 cannot use a GPU with only CUDA 11.8 drivers installed: the compiled kernel instructions don't match the GPU's instruction set. How it works: When you pip install torch, PyTorch downloads a binary built for a specific CUDA version (e.g., cu121). Your GPU driver exposes a CUDA Compute Capability (e.g., 8.6 for RTX 3090). PyTorch checks if the binary's CUDA version is compatible with your driver's advertised capability. If there's a mismatch, torch.cuda.is_available() returns False, and all operations silently use CPU. When to use this: Before any PyTorch GPU development, run the check in the code below to verify your installed CUDA version matches your driver. This is a one-time setup step, not something you repeat in production code.
Analogy
Think of CUDA versions like electrical outlet standards. Your PyTorch binary is a plug designed for 220V (CUDA 12.1). Your GPU driver provides a 110V outlet (CUDA 11.8). Physically, the plug might fit, but it won't work correctly. A device that expects 220V but gets 110V will either not work or work very slowly: it won't explode, it just silently underperforms.
Code
import torch
import subprocess
print("=== PyTorch CUDA Configuration ===")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"CUDA version (compiled): {torch.version.cuda}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")
print(f"GPU compute capability: {torch.cuda.get_device_capability(0)}")
result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
print("\n=== nvidia-smi output (driver check) ===")
print(result.stdout.split('\n')[0:3])
else:
print("⚠️ CUDA is not available. Check driver installation.")
try:
result = subprocess.run(['nvidia-smi'], capture_output=True, text=True, timeout=5)
if result.returncode == 0:
print("nvidia-smi works, but PyTorch cannot access GPU.")
print("Reinstall PyTorch with matching CUDA version.")
else:
print("nvidia-smi not found. GPU driver not installed.")
except FileNotFoundError:
print("nvidia-smi not found. GPU driver not installed.") === PyTorch CUDA Configuration === PyTorch version: 2.11.1+cu121 CUDA available: True CUDA version (compiled): 12.1 GPU name: NVIDIA RTX 4090 GPU compute capability: (8, 9) === nvidia-smi output (driver check) === +-----------------------------------------------------------------------------+ | NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 | | CUDA Version: 12.1 |
What just happened?
The code checked if PyTorch can see and use your GPU by calling <code>torch.cuda.is_available()</code>. If True, it printed the CUDA version that PyTorch was compiled with (<code>torch.version.cuda</code>), your GPU's name, and its compute capability. Then it ran <code>nvidia-smi</code> to show your driver's CUDA version. If the compiled CUDA and driver CUDA match (or driver is newer), GPU is usable. If they don't match, <code>is_available()</code> would be False and all tensors stay on CPU.
Common gotcha
The most common mistake: seeing CUDA available: False and assuming the GPU is broken, when actually the installed PyTorch CUDA version doesn't match the driver. Developers then try uninstalling and reinstalling PyTorch for the wrong CUDA version, or reinstall the driver, or reboot: none of which help. The fix is always: identify what CUDA version your driver supports (from nvidia-smi), then reinstall PyTorch for that exact version.
Error recovery
RuntimeError: "CUDA out of memory"CUDA available: False with GPU presentTypeError: 'NoneType' object is not subscriptable from torch.cuda.get_device_capability()Experienced dev note
The silent failure is the trap. Your code will run without error on CPU, making it seem correct during development. You won't catch it until you benchmark in production and notice 50x slowdown. Always check torch.cuda.is_available() in your training loop and log it. Better: fail explicitly with assert torch.cuda.is_available(), "CUDA required for this model" rather than silently accepting CPU. Also, use torch.cuda.get_device_properties(0) to inspect compute capability: this is what determines which PyTorch CUDA versions are compatible, not the 'official' driver version number which can be misleading.
Check your understanding
You have PyTorch 2.11.1+cu121 installed. Your GPU driver reports CUDA 11.8 in nvidia-smi. Will torch.cuda.is_available() return True or False, and why? If False, what single command would fix it?
Show answer hint
The answer requires understanding that PyTorch's compiled CUDA (cu121) must be compatible with the driver's CUDA version (11.8). The question tests whether you know the compatibility rule (driver CUDA ≥ compiled CUDA) and the fix (reinstall PyTorch for cu118). A correct answer identifies the mismatch, predicts the False result, and gives the correct pip install command.