Code Beginner easy · 4 min

CUDA version matching: the critical step

What you will learn

Your PyTorch CUDA version must match your GPU driver CUDA Compute Capability to run on GPU; mismatch silently falls back to CPU.

Why this matters

A single version mismatch means your code runs on CPU instead of GPU without throwing an error: your training becomes 50-100x slower and you won't notice until production. Developers often debug model logic when the real problem is silent CPU execution.

Skip if: If you are developing on CPU-only machines (no NVIDIA GPU) or using cloud platforms (Lambda Labs, Colab, Kaggle) that pre-install matching CUDA versions. In those cases, PyTorch handles it automatically. Also unnecessary if you use Docker with a pre-built CUDA+PyTorch image.

Explanation

What it is: CUDA is NVIDIA's parallel computing platform. PyTorch compiled for CUDA 12.1 cannot use a GPU with only CUDA 11.8 drivers installed: the compiled kernel instructions don't match the GPU's instruction set. How it works: When you pip install torch, PyTorch downloads a binary built for a specific CUDA version (e.g., cu121). Your GPU driver exposes a CUDA Compute Capability (e.g., 8.6 for RTX 3090). PyTorch checks if the binary's CUDA version is compatible with your driver's advertised capability. If there's a mismatch, torch.cuda.is_available() returns False, and all operations silently use CPU. When to use this: Before any PyTorch GPU development, run the check in the code below to verify your installed CUDA version matches your driver. This is a one-time setup step, not something you repeat in production code.

Analogy

Think of CUDA versions like electrical outlet standards. Your PyTorch binary is a plug designed for 220V (CUDA 12.1). Your GPU driver provides a 110V outlet (CUDA 11.8). Physically, the plug might fit, but it won't work correctly. A device that expects 220V but gets 110V will either not work or work very slowly: it won't explode, it just silently underperforms.

Code

python

import torch
import subprocess

print("=== PyTorch CUDA Configuration ===")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version (compiled): {torch.version.cuda}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}")
    print(f"GPU compute capability: {torch.cuda.get_device_capability(0)}")
    
    result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
    print("\n=== nvidia-smi output (driver check) ===")
    print(result.stdout.split('\n')[0:3])
else:
    print("⚠️  CUDA is not available. Check driver installation.")
    try:
        result = subprocess.run(['nvidia-smi'], capture_output=True, text=True, timeout=5)
        if result.returncode == 0:
            print("nvidia-smi works, but PyTorch cannot access GPU.")
            print("Reinstall PyTorch with matching CUDA version.")
        else:
            print("nvidia-smi not found. GPU driver not installed.")
    except FileNotFoundError:
        print("nvidia-smi not found. GPU driver not installed.")

Output

=== PyTorch CUDA Configuration ===
PyTorch version: 2.11.1+cu121
CUDA available: True
CUDA version (compiled): 12.1
GPU name: NVIDIA RTX 4090
GPU compute capability: (8, 9)

=== nvidia-smi output (driver check) ===
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02                 |
| CUDA Version: 12.1                                                          |

What just happened?

The code checked if PyTorch can see and use your GPU by calling <code>torch.cuda.is_available()</code>. If True, it printed the CUDA version that PyTorch was compiled with (<code>torch.version.cuda</code>), your GPU's name, and its compute capability. Then it ran <code>nvidia-smi</code> to show your driver's CUDA version. If the compiled CUDA and driver CUDA match (or driver is newer), GPU is usable. If they don't match, <code>is_available()</code> would be False and all tensors stay on CPU.

Common gotcha

The most common mistake: seeing CUDA available: False and assuming the GPU is broken, when actually the installed PyTorch CUDA version doesn't match the driver. Developers then try uninstalling and reinstalling PyTorch for the wrong CUDA version, or reinstall the driver, or reboot: none of which help. The fix is always: identify what CUDA version your driver supports (from nvidia-smi), then reinstall PyTorch for that exact version.

Error recovery

RuntimeError: "CUDA out of memory"

Your GPU is being used (good), but the tensor doesn't fit. Reduce batch size or model size. This error only appears if CUDA is actually available.

CUDA available: False with GPU present

PyTorch binary CUDA version does not match driver. Run 'pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121' (replace cu121 with your driver's CUDA version from nvidia-smi). Uninstall old version first: 'pip uninstall torch -y'.

TypeError: 'NoneType' object is not subscriptable from torch.cuda.get_device_capability()

CUDA is not available, so get_device_capability() returns None. Always check torch.cuda.is_available() before calling GPU-specific functions.

Experienced dev note

The silent failure is the trap. Your code will run without error on CPU, making it seem correct during development. You won't catch it until you benchmark in production and notice 50x slowdown. Always check torch.cuda.is_available() in your training loop and log it. Better: fail explicitly with assert torch.cuda.is_available(), "CUDA required for this model" rather than silently accepting CPU. Also, use torch.cuda.get_device_properties(0) to inspect compute capability: this is what determines which PyTorch CUDA versions are compatible, not the 'official' driver version number which can be misleading.

Check your understanding

You have PyTorch 2.11.1+cu121 installed. Your GPU driver reports CUDA 11.8 in nvidia-smi. Will torch.cuda.is_available() return True or False, and why? If False, what single command would fix it?

Show answer hint

The answer requires understanding that PyTorch's compiled CUDA (cu121) must be compatible with the driver's CUDA version (11.8). The question tests whether you know the compatibility rule (driver CUDA ≥ compiled CUDA) and the fix (reinstall PyTorch for cu118). A correct answer identifies the mismatch, predicts the False result, and gives the correct pip install command.

VERSION PyTorch 2.6.x (Nov 2024) and earlier used discrete CUDA wheels (cu118, cu121). PyTorch 2.11.x (Mar 2026) maintains this pattern but with better error messages. The core CUDA version matching requirement has not changed since PyTorch 1.9.0, but the default pip index behavior improved significantly in 2.0.0+ to provide clearer guidance.

Moving a tensor to GPU device explicitly using <code>.to('cuda')</code> or <code>.cuda()</code>: this is where you actually control which device runs your computation.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.