How to beginner · 3 min read

How to use Hugging Face InferenceClient

Q: How to use Hugging Face InferenceClient

Use the InferenceClient from the huggingface_hub Python package to call Hugging Face hosted models. Initialize it with your API token from os.environ, then call client.text_generation.create() or other task methods to get model outputs.

Quick answer

Use the InferenceClient from the huggingface_hub Python package to call Hugging Face hosted models. Initialize it with your API token from os.environ, then call client.text_generation.create() or other task methods to get model outputs.

PREREQUISITES

Python 3.8+
Hugging Face account with API token
pip install huggingface-hub>=0.15.0

Setup

Install the huggingface-hub package and set your Hugging Face API token as an environment variable.

Install package: pip install huggingface-hub
Set environment variable: export HUGGINGFACEHUB_API_TOKEN='your_token_here' on Linux/macOS or setx HUGGINGFACEHUB_API_TOKEN "your_token_here" on Windows.

bash

pip install huggingface-hub

Step by step

Use InferenceClient to call a Hugging Face hosted model for text generation. The example below calls the gpt2 model for a simple prompt.

python

import os
from huggingface_hub import InferenceClient

# Initialize client with API token from environment
client = InferenceClient(token=os.environ["HUGGINGFACEHUB_API_TOKEN"])

# Call the text-generation pipeline on the 'gpt2' model
response = client.text_generation.create(
    model="gpt2",
    inputs="Hello, world!"
)

print("Generated text:", response.generated_text)

output

Generated text: Hello, world! I am a language model developed by OpenAI, and I can generate text based on the input you provide.

Common variations

You can use InferenceClient for other tasks like image generation, audio transcription, or chat completions by calling the respective methods such as client.text_to_image.create() or client.audio_to_text.create(). Async usage is also supported with asyncio.

python

import asyncio
from huggingface_hub import InferenceClient

async def async_text_generation():
    client = InferenceClient(token=os.environ["HUGGINGFACEHUB_API_TOKEN"])
    response = await client.text_generation.acreate(
        model="gpt2",
        inputs="Async call example"
    )
    print("Async generated text:", response.generated_text)

asyncio.run(async_text_generation())

output

Async generated text: Async call example is a demonstration of asynchronous inference using Hugging Face InferenceClient.

Troubleshooting

If you get an authentication error, verify your HUGGINGFACEHUB_API_TOKEN environment variable is set correctly.
For model not found errors, check the model ID spelling and availability on Hugging Face Hub.
Network errors may require checking your internet connection or proxy settings.

✅

Key Takeaways

Use InferenceClient from huggingface_hub for easy access to Hugging Face hosted models.
Always load your API token securely from os.environ to authenticate requests.
The client supports multiple tasks like text generation, image generation, and audio transcription with simple method calls.
Async methods are available for non-blocking inference in Python applications.
Check environment variables and model IDs carefully to avoid common errors.

Verified 2026-04 · gpt2

Verify ↗