Code Beginner easy · 4 min

Verifying with a pipeline call

What you will learn

Use the high-level pipeline() API to quickly verify that a model works before diving into tokenizer and model code.

Why this matters

Before writing token IDs, attention masks, and device management code, you need to know the model actually loads and produces sensible output: pipeline() handles all the complexity for you and runs in 3 lines.

Skip if: Do not use pipeline() in production services handling hundreds of requests per second: it does not support batching efficiently and rebuilds the model on each call if not cached. Use direct model calls with batched tensors instead.

Explanation

What it is: The pipeline() function is a Hugging Face convenience wrapper that combines tokenization, model loading, and inference into a single call. You pass raw text, get predictions back: no tensor manipulation required.

How it works: When you call pipeline(task_name, model=model_id), Hugging Face automatically selects the right tokenizer and model for that task, loads them to the default device (CPU or GPU if available), and returns a callable object. You then pass a string (or list of strings) to that object, and it internally tokenizes, feeds tensors to the model, and returns parsed outputs.

When to use it: Use pipeline() for quick verification that a model card works as advertised, for interactive demos, or for simple single-inference scenarios. Once you need batching, custom tokenization, or tight performance control, switch to explicit AutoTokenizer + AutoModel calls.

Analogy

Think of pipeline() as a pre-packaged meal: everything you need is assembled and ready. Explicit tokenizer + model is buying ingredients and cooking it yourself: more control, but more work.

Code

python

import torch
from transformers import pipeline

model_name = "distilbert-base-uncased-finetuned-sst-2-english"

classifier = pipeline(
    task="text-classification",
    model=model_name,
    device=0 if torch.cuda.is_available() else -1
)

result = classifier("This movie is absolutely fantastic!")
print(f"Input: 'This movie is absolutely fantastic!'")
print(f"Prediction: {result}")

result2 = classifier("I hated every second of it.")
print(f"\nInput: 'I hated every second of it.'")
print(f"Prediction: {result2}")

Output

Input: 'This movie is absolutely fantastic!'
Prediction: [{'label': 'POSITIVE', 'score': 0.9998761415481567}]

Input: 'I hated every second of it.'
Prediction: [{'label': 'NEGATIVE', 'score': 0.9997040033340454}]

What just happened?

You imported the pipeline function, specified a sentiment classification model from the Hugging Face Hub, created a pipeline object that auto-downloaded and cached the tokenizer and model, and then called it twice with different sentences. Each call internally tokenized the text, converted tokens to tensors, ran inference through the model, and returned parsed label/score dictionaries. The device flag told the pipeline to use GPU if available (device=0), otherwise CPU (device=-1).

Common gotcha

Developers often call pipeline() multiple times in a loop, thinking each call is cheap. It is not: the first call downloads and loads the entire model from disk or network. Always create the pipeline object once and reuse it. Also, if you do not specify device=, pipeline() defaults to CPU on most setups even if GPU is available: always pass device=0 to explicitly use the first GPU.

Error recovery

OSError: Can't load ...

The model name is misspelled or does not exist on the Hugging Face Hub. Check the exact model name at huggingface.co/models and ensure you have internet access for the first download.

RuntimeError: CUDA out of memory

The model is too large for your GPU. Either use a smaller model (e.g., 'distilbert-base-uncased' instead of 'bert-large-uncased'), or set device=-1 to run on CPU.

TypeError: pipeline() got an unexpected keyword argument 'device'

You are using an older transformers version (< 5.0). Upgrade with 'pip install --upgrade transformers' to get device_map support.

Experienced dev note

In transformers 5.5.x, always pass device=0 or device_map='auto' explicitly: do not rely on defaults. Also, pipeline() now respects torch.dtype natively, so if you need bfloat16 inference, set the model dtype before wrapping it in a pipeline. Finally, pipeline objects are not serializable with pickle: if you are running on distributed workers, create the pipeline on each worker, not once globally.

Check your understanding

Why does calling pipeline() a second time with different text not trigger a re-download of the model, and where is that model being kept between calls?

Show answer hint

The key insight is caching: pipeline() downloads once on first instantiation (not on first call), stores it in the HF_HOME cache directory (typically ~/.cache/huggingface/hub/), and subsequent instantiations or calls reuse that cached copy. This is why the code example creates one classifier object and calls it twice.

VERSION In transformers < 5.0, the device parameter was not available on pipeline(): use device_map='auto' instead. In 5.5.x+, both work but device= is the preferred modern syntax. The model auto-casting behavior also changed in 5.x: you may need to explicitly set torch_dtype if you want half-precision inference.

Now that you know models work with pipeline(), learn how to swap the underlying tokenizer and model for explicit control by using AutoTokenizer and AutoModelForSequenceClassification together.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.