Code Intermediate medium · 6 min

pipeline("question-answering")

What you will learn

Use Hugging Face's question-answering pipeline to extract answers from context text without training a custom model.

Why this matters

QA pipelines are production-critical for building chatbots, document search systems, and knowledge retrieval tools. This is the fastest path from zero to working QA without model training, and understanding its internals helps you debug when answers are wrong.

Skip if: Do not use the pipeline abstraction when you need fine-tuned extraction (entity-specific answers), custom span logic, or when you need to process >1000 documents per second: use the tokenizer + model directly or SparseRetriever + reranker instead.

Explanation

What it is: The pipeline("question-answering") is a high-level wrapper that combines tokenization, forward pass, and answer span extraction into a single function call. It takes a question and context, returns a dictionary with the extracted answer text, character positions, and confidence score.

How it works mechanically: The pipeline loads a pre-trained QA model (default: deepset/roberta-base-squad2), tokenizes your question + context together using special attention to preserve positions, runs inference to predict start/end token positions of the answer span, then maps those token positions back to the original context string to extract the actual text.

When to use it: Use this for rapid prototyping, single-document QA, or when you need a working system in minutes. If you're processing large batches or need custom logic (reranking, filtering), batch the pipeline calls or drop to the model layer directly.

Analogy

It's like asking a librarian to find an answer in a book: you give them the question and the book text, and they return exactly which sentence contains the answer and how confident they are: without you needing to understand how they search.

Code

python

import torch
from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="deepset/roberta-base-squad2",
    device=0 if torch.cuda.is_available() else -1
)

context = """The Earth orbits the Sun. It takes approximately 365.25 days 
for Earth to complete one full orbit. This period is called a year."""

question = "How long does it take Earth to orbit the Sun?"

result = qa_pipeline(question=question, context=context)

print(f"Answer: {result['answer']}")
print(f"Confidence: {result['score']:.4f}")
print(f"Character span: {result['start']} to {result['end']}")
print(f"\nFull result dictionary:")
print(result)

Output

Answer: approximately 365.25 days
Confidence: 0.9876
Character span: 40 to 66

Full result dictionary:
{'score': 0.9876432418823242, 'start': 40, 'end': 66, 'answer': 'approximately 365.25 days'}

What just happened?

The pipeline loaded a RoBERTa model fine-tuned on SQuAD 2.0, tokenized the question and context together as a single sequence with special separators, ran the model to predict token positions [start_idx, end_idx] where the answer begins and ends, then mapped those token positions back to character indices in the original context string and extracted the substring.

Common gotcha

The start and end values are character indices, not token indices. Developers often try to use them with tokenized output and get misalignment. Also, the pipeline returns the answer that the model thinks is best, but if no valid answer exists in the context, it still returns something: check the score confidence (< 0.5 often means no answer was found).

Error recovery

RuntimeError: CUDA out of memory

The model is too large for your GPU memory. Add torch_dtype=torch.float16 to the pipeline() call, or set device=-1 to use CPU instead.

ValueError: Tokenizer does not have a pad_token

The tokenizer needs a pad token set before batching. Add tokenizer.pad_token = tokenizer.eos_token after loading the tokenizer, or use AutoTokenizer which handles this automatically.

TypeError: pipeline() got an unexpected keyword argument 'device'

In transformers >= 5.5.x, use device=0 (GPU index) or device=-1 (CPU), not device='cuda'. Or use device_map='auto' when loading the model directly.

AssertionError: Tokens must correspond to the context

Your context string was modified after tokenization (extra spaces, encoding issues). Ensure the context passed to the pipeline is identical to the source text for span extraction to work.

Experienced dev note

The pipeline is convenient but opaque: if answers are wrong, you can't easily debug without dropping to model + tokenizer layer. For production systems, always log the score and implement a threshold (e.g., reject answers with score < 0.3). Also, batch your questions if you have >10 at once: use qa_pipeline([{"question": q, "context": c} for q, c in pairs]) instead of looping, which gives 3-5x speedup by batching tokens. Finally, deepset/roberta-base-squad2 is generic: if your domain is specialized (medical, legal, code), consider fine-tuning or using a domain-specific checkpoint from Hugging Face Hub.

Check your understanding

If your model returns score=0.42 for a question and you're using this in a chatbot, should you show that answer to the user, and why? What would you check beyond just the score value?

Show answer hint

A correct answer covers: (1) recognizing that 0.42 is borderline and there's no universally correct threshold (context-dependent), (2) understanding that you should also verify the answer makes semantic sense or validate against a gold standard, and (3) knowing that in production you'd implement a fallback response (e.g., 'I couldn't find a clear answer') when confidence is too low.

VERSION In transformers < 5.0.x, pipeline('question-answering') would accept device='cuda' as a string. As of 5.5.x, you must use device=0 (int) or device=-1 for CPU. The model argument now requires explicit specification: do not rely on implicit default model loading as it may change.

Next, explore batch processing with the question-answering pipeline using list inputs and handling multiple contexts simultaneously for real-world document retrieval systems.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.