How to beginner · 3 min read

How to use Hugging Face Inference API

Quick answer
Use the Hugging Face Inference API by sending HTTP POST requests with your model and input data, authenticated via an API token. In Python, use the requests library to call the API endpoint https://api-inference.huggingface.co/models/{model_id} with your input text in JSON format.

PREREQUISITES

  • Python 3.8+
  • Hugging Face account with an API token
  • pip install requests

Setup

Install the requests library and set your Hugging Face API token as an environment variable for secure authentication.

bash
pip install requests

Step by step

This example shows how to call the Hugging Face Inference API to generate text completions using the gpt2 model. Replace YOUR_HF_API_TOKEN with your actual token stored in the environment variable HF_API_TOKEN.

python
import os
import requests

API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": f"Bearer {os.environ['HF_API_TOKEN']}"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    response.raise_for_status()
    return response.json()

if __name__ == "__main__":
    data = {"inputs": "The future of AI is"}
    output = query(data)
    print(output)
output
[{'generated_text': 'The future of AI is very promising, with many exciting developments ahead.'}]

Common variations

You can use different models by changing the API_URL to another model ID, such as facebook/bart-large-cnn for summarization. For asynchronous calls or streaming, use libraries like httpx or WebSocket clients. The API supports various tasks including text generation, summarization, translation, and more.

python
import os
import httpx

API_URL = "https://api-inference.huggingface.co/models/facebook/bart-large-cnn"
headers = {"Authorization": f"Bearer {os.environ['HF_API_TOKEN']}"}

async def async_query(payload):
    async with httpx.AsyncClient() as client:
        response = await client.post(API_URL, headers=headers, json=payload)
        response.raise_for_status()
        return response.json()

import asyncio

if __name__ == "__main__":
    data = {"inputs": "The quick brown fox jumps over the lazy dog."}
    result = asyncio.run(async_query(data))
    print(result)
output
[{'summary_text': 'The quick brown fox jumps over the lazy dog.'}]

Troubleshooting

  • If you receive a 401 Unauthorized error, verify your API token is correct and set in HF_API_TOKEN.
  • A 503 Service Unavailable error may indicate the model is loading; retry after a few seconds.
  • For rate limiting errors, reduce request frequency or upgrade your Hugging Face plan.

Key Takeaways

  • Use the Hugging Face Inference API by sending POST requests with your API token in the Authorization header.
  • Change the model by modifying the API endpoint URL to target different Hugging Face models.
  • Handle common HTTP errors like 401 and 503 by checking your token and retrying if the model is loading.
Verified 2026-04 · gpt2, facebook/bart-large-cnn
Verify ↗