How to beginner · 3 min read

How to use Hugging Face InferenceClient

Quick answer
Use the InferenceClient from the huggingface_hub Python package to call Hugging Face hosted models. Initialize it with your API token from os.environ, then call client.text_generation.create() or other task methods to get model outputs.

PREREQUISITES

  • Python 3.8+
  • Hugging Face account with API token
  • pip install huggingface-hub>=0.15.0

Setup

Install the huggingface-hub package and set your Hugging Face API token as an environment variable.

  • Install package: pip install huggingface-hub
  • Set environment variable: export HUGGINGFACEHUB_API_TOKEN='your_token_here' on Linux/macOS or setx HUGGINGFACEHUB_API_TOKEN "your_token_here" on Windows.
bash
pip install huggingface-hub

Step by step

Use InferenceClient to call a Hugging Face hosted model for text generation. The example below calls the gpt2 model for a simple prompt.

python
import os
from huggingface_hub import InferenceClient

# Initialize client with API token from environment
client = InferenceClient(token=os.environ["HUGGINGFACEHUB_API_TOKEN"])

# Call the text-generation pipeline on the 'gpt2' model
response = client.text_generation.create(
    model="gpt2",
    inputs="Hello, world!"
)

print("Generated text:", response.generated_text)
output
Generated text: Hello, world! I am a language model developed by OpenAI, and I can generate text based on the input you provide.

Common variations

You can use InferenceClient for other tasks like image generation, audio transcription, or chat completions by calling the respective methods such as client.text_to_image.create() or client.audio_to_text.create(). Async usage is also supported with asyncio.

python
import asyncio
from huggingface_hub import InferenceClient

async def async_text_generation():
    client = InferenceClient(token=os.environ["HUGGINGFACEHUB_API_TOKEN"])
    response = await client.text_generation.acreate(
        model="gpt2",
        inputs="Async call example"
    )
    print("Async generated text:", response.generated_text)

asyncio.run(async_text_generation())
output
Async generated text: Async call example is a demonstration of asynchronous inference using Hugging Face InferenceClient.

Troubleshooting

  • If you get an authentication error, verify your HUGGINGFACEHUB_API_TOKEN environment variable is set correctly.
  • For model not found errors, check the model ID spelling and availability on Hugging Face Hub.
  • Network errors may require checking your internet connection or proxy settings.

Key Takeaways

  • Use InferenceClient from huggingface_hub for easy access to Hugging Face hosted models.
  • Always load your API token securely from os.environ to authenticate requests.
  • The client supports multiple tasks like text generation, image generation, and audio transcription with simple method calls.
  • Async methods are available for non-blocking inference in Python applications.
  • Check environment variables and model IDs carefully to avoid common errors.
Verified 2026-04 · gpt2
Verify ↗