Code Beginner easy · 6 min

The PyTorch ecosystem: Lightning, HuggingFace, etc.

What you will learn

PyTorch has a rich ecosystem of libraries that handle boilerplate and accelerate common workflows like training loops and model loading.

Why this matters

In production, you'll rarely write raw PyTorch training loops: frameworks like Lightning, Hugging Face Transformers, and Torchvision solve real problems: reproducibility, multi-GPU setup, checkpoint management, and pre-trained models. Knowing what each solves prevents reinventing the wheel and keeps your code maintainable.

Skip if: Don't use Lightning or Hugging Face if you're building a custom research prototype that requires fine-grained control over every training step, or if you're optimizing a highly specialized domain (like physics simulations) where boilerplate abstraction gets in the way.

Explanation

The PyTorch ecosystem is a collection of libraries built on top of PyTorch that handle common tasks: PyTorch Lightning abstracts training loops and multi-GPU logic; Hugging Face Transformers provides pre-trained NLP models and utilities; Torchvision handles computer vision datasets and models. Mechanically, these libraries wrap PyTorch tensors and nn.Module in higher-level abstractions. Lightning replaces your training loop with a Trainer class that you configure; Hugging Face wraps model initialization and tokenization into a unified API; Torchvision provides dataset loaders and pre-trained weights out of the box. When to use each: Use Lightning for any project where you're building a standard supervised model and want reproducible, scalable training. Use Hugging Face Transformers for NLP or multimodal tasks: don't train BERT from scratch yourself. Use Torchvision for image classification, object detection, or semantic segmentation tasks with standard architectures.

Analogy

PyTorch is the engine. The ecosystem libraries are pre-built cars (Lightning), pre-assembled dashboards (Hugging Face), and wheels + tires (Torchvision) that let you assemble something functional without fabricating every part.

Code

python

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
import pytorch_lightning as pl
from pytorch_lightning import Trainer

# Example 1: Raw PyTorch training loop (what Lightning abstracts)
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 2)
    
    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

X_train = torch.randn(100, 10)
y_train = torch.randint(0, 2, (100,))
train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=32)

for epoch in range(2):
    for batch_x, batch_y in train_loader:
        optimizer.zero_grad()
        logits = model(batch_x)
        loss = loss_fn(logits, batch_y)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1} loss: {loss.item():.4f}")

print("\n--- Now with Lightning (cleaner) ---\n")

# Example 2: Same thing with PyTorch Lightning
class LitModel(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 2)
        self.loss_fn = nn.CrossEntropyLoss()
    
    def forward(self, x):
        return self.linear(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = self(x)
        loss = self.loss_fn(logits, y)
        return loss
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.001)

lit_model = LitModel()
trainer = Trainer(max_epochs=2, enable_progress_bar=False, logger=False)
trainer.fit(lit_model, train_loader)
print("\n--- Hugging Face example (loading pre-trained) ---\n")

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model_hf = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

text = "This is a great day!"
encoded = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model_hf(**encoded)
    logits = outputs.logits

print(f"Input text: {text}")
print(f"Logits shape: {logits.shape}")
print(f"Logits: {logits}")

Output

Epoch 1 loss: 0.6931
Epoch 2 loss: 0.6598

--- Now with Lightning (cleaner) ---

--- Hugging Face example (loading pre-trained) ---

Input text: This is a great day!
Logits shape: torch.Size([1, 2])
Logits: tensor([[-0.2841,  0.2459]], grad_fn=<AddBackward0>)

What just happened?

The code showed three ecosystem tools in action: (1) Raw PyTorch with manual training loop: you manage optimizer, loss, and epoch iteration explicitly; (2) PyTorch Lightning wrapping the same logic: you define only training_step() and configure_optimizers(), and Lightning handles the loop, device management, and logging; (3) Hugging Face Transformers loading a pre-trained DistilBERT model and tokenizer in 2 lines, then running inference without training setup at all. Each ecosystem tool removes a layer of boilerplate.

Common gotcha

Developers often assume Lightning is slower or less flexible than raw PyTorch. It's not: it actually optimizes for multi-GPU and mixed precision better than hand-written loops. The real gotcha: Lightning expects you to follow its conventions (training_step, validation_step, test_step). If you deviate, your code becomes fragile. Stick to the pattern.

Error recovery

ModuleNotFoundError: No module named 'pytorch_lightning'

Install it with: pip install pytorch-lightning

ModuleNotFoundError: No module named 'transformers'

Install it with: pip install transformers

RuntimeError: Expected all tensors to be on the same device

In Lightning, move your model to device: trainer.fit() handles this. In raw PyTorch, explicitly call model.to(device) and batch.to(device). Hugging Face does this automatically when tensors are returned from tokenizer with return_tensors='pt'.

ValueError: num_labels must be set for AutoModelForSequenceClassification

Always pass num_labels when loading a pre-trained model for fine-tuning or downstream tasks. num_labels is the number of output classes for your specific task.

Experienced dev note

Here's what saves time in production: Lightning's Trainer handles distributed training across GPUs/TPUs transparently: change max_epochs to enable_progress_bar and add strategy='ddp' without touching your training_step(). Hugging Face Transformers models come pre-trained on massive datasets (BERT on 3.3B tokens), so you're not training from random weights for NLP. A senior dev working on NLP rarely trains a transformer from scratch; they load a pre-trained and fine-tune. Torchvision's pre-trained ResNet50 has weights from ImageNet: again, start ahead. The ecosystem exists because the wheel was invented and optimized. Use it.

Check your understanding

You have a text classification task with 5 classes. Would you use Lightning, Hugging Face, or raw PyTorch for this? Explain why, and what boilerplate each approach removes.

Show answer hint

A correct answer identifies that Hugging Face Transformers + Lightning together is the production choice: Transformers gives you the pre-trained model and tokenizer (days of training avoided), Lightning gives you the training loop and multi-GPU setup (dozens of lines of boilerplate avoided). Raw PyTorch is only if you're building a custom architecture that neither framework supports.

VERSION PyTorch Lightning 2.0+ uses torch.compile() under the hood for speedup: no code change needed on your side. Hugging Face Transformers 4.30+ supports torch.compile() and torch.amp.autocast() natively. PyTorch 2.11.x (current as of March 2026) is fully compatible with all three libraries. No breaking changes in the ecosystem at this version.

Next, learn how to build a custom nn.Module from scratch to understand the class structure that Lightning and Hugging Face wrap around.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.