Code Beginner easy · 5 min

Pipelines: the fastest path to inference

What you will learn

Use <code>pipeline()</code> to load a pretrained model and run inference in two lines of code without managing tensors or devices yourself.

Why this matters

Pipelines abstract away tokenization, model loading, tensor handling, and device management: the four biggest friction points for developers new to transformers. You can build a working sentiment classifier or text generator in under a minute instead of debugging tensor shapes for 20 minutes.

Skip if: Don't use pipelines in production when you need to batch process 10,000+ texts: the overhead of the pipeline wrapper becomes a bottleneck. Also skip pipelines if you need custom attention outputs, intermediate layer activations, or fine-grained control over generation parameters.

Explanation

A pipeline is a preset workflow that handles model loading, input preprocessing, inference, and output postprocessing automatically. You pass raw text; it returns a parsed result. Mechanically, when you call pipeline('sentiment-analysis'), transformers downloads a default pretrained model (e.g., distilbert-base-uncased-finetuned-sst-2-english), wraps it in a task-specific processor, and binds it to your device (CPU/GPU). When you call the pipeline with text, it tokenizes the input, runs the forward pass, and decodes predictions into human-readable format. Use it for prototyping, demos, simple single-text inference, or when model choice doesn't matter and you want the fastest development path.

Analogy

A pipeline is like ordering food at a fast-casual restaurant: you specify what you want (task name), they handle all the kitchen complexity (tokenization, model loading, tensor conversion), and you get the result in seconds. If you need a custom recipe (fine-tuned model, custom preprocessing), you go to a chef (write custom code).

Code

python

import torch
from transformers import pipeline

classifier = pipeline(
    'sentiment-analysis',
    model='distilbert-base-uncased-finetuned-sst-2-english',
    device=0 if torch.cuda.is_available() else -1
)

result = classifier('I absolutely love this course! The explanations are so clear.')
print(f'Sentiment: {result[0]["label"]}')
print(f'Confidence: {result[0]["score"]:.4f}')

results_batch = classifier([
    'This is terrible.',
    'This is wonderful.'
])
for i, res in enumerate(results_batch):
    print(f'Text {i}: {res["label"]} ({res["score"]:.4f})')

Output

Sentiment: POSITIVE
Confidence: 0.9998
Text 0: NEGATIVE (0.9997)
Text 1: POSITIVE (1.0000)

What just happened?

The pipeline downloaded a pretrained distilbert model fine-tuned for sentiment analysis (or loaded it from cache). It tokenized the input text, padded it to match the model's expected length, converted it to tensors, ran the forward pass on the specified device (GPU if available, CPU otherwise), and then decoded the logits into a label ('POSITIVE' or 'NEGATIVE') and normalized probability score. For the batch, it applied the same process to each text independently and returned a list of predictions.

Common gotcha

Developers assume pipeline() without a model argument uses a sensible default. It does: but that default model downloads ~250MB on first run and may not be the best for your task. Always pin the model name explicitly so you know exactly what's running, teammates can reproduce your results, and you don't accidentally download a different version next month when HuggingFace updates defaults.

Error recovery

ValueError: 'sentiment-analysis' is not a valid task name

You misspelled the task. Valid tasks in transformers 5.5.x include 'sentiment-analysis', 'text-classification', 'token-classification', 'question-answering', 'text-generation', 'summarization', 'translation_en_to_de', etc. Check the pipeline docs or use pipeline() with no arguments to see the list.

RuntimeError: CUDA out of memory

The model doesn't fit on your GPU. Either reduce batch size (pass a list with fewer items), use a smaller model (e.g., 'distilbert' instead of 'bert-base'), or set device=-1 to use CPU instead. For production, use quantization (BitsAndBytesConfig) in a later lesson.

FileNotFoundError: [Errno 2] No such file or directory

The model name doesn't exist on HuggingFace Hub or your internet is down. Check model name spelling at huggingface.co/models. You can also verify with: from transformers import AutoModel; AutoModel.from_pretrained('your-model-name')

OSError: Can't load tokenizer for 'my-custom-model'

The model you specified doesn't have a tokenizer config on the Hub, or it's a private repo and you're not authenticated. Verify the model exists and is public, or run huggingface-cli login and try again.

Experienced dev note

In transformers < 5.0, developers would write 30 lines of code to load a model, manage device placement, handle mixed precision, and batch tokenize. Pipelines in 5.5.x hide all of that. The trap: you don't learn what's underneath, so when you need performance tuning or custom preprocessing, you're lost. Use pipelines to ship fast. But spend 30 minutes reading a custom inference example (next lesson) so you understand the layers being abstracted: you'll debug faster and make smarter performance decisions later.

Check your understanding

Why would pinning the model name in pipeline() matter more in a team setting than in a personal notebook, and what specific problem does it solve?

Show answer hint

A correct answer explains that without pinning, different team members might download different default models on their first run (if HuggingFace updates defaults), leading to nondeterministic results. It also mentions reproducibility: saved model references help teammates run the exact same code without surprises.

VERSION In transformers < 5.0, pipeline() without a device argument would default to CPU even if CUDA was available. In 5.5.x, pipelines auto-detect CUDA. Always pass device=0 or device=-1 explicitly to avoid silent device mismatches between local and production environments.

Next, learn how to load models and tokenizers separately so you understand what pipelines hide and can batch-process large datasets efficiently.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.