How to beginner · 3 min read

How to use Fireworks AI with LiteLLM

Q: How to use Fireworks AI with LiteLLM

Use the openai Python SDK with base_url set to Fireworks AI's endpoint and your API key from os.environ. Instantiate the OpenAI client, then call chat.completions.create with the Fireworks AI model and messages. LiteLLM can invoke this client for lightweight local or remote inference.

Quick answer

Use the openai Python SDK with base_url set to Fireworks AI's endpoint and your API key from os.environ. Instantiate the OpenAI client, then call chat.completions.create with the Fireworks AI model and messages. LiteLLM can invoke this client for lightweight local or remote inference.

PREREQUISITES

Python 3.8+
Fireworks AI API key
pip install openai>=1.0
LiteLLM installed and configured

Setup

Install the openai Python package (v1+) to access Fireworks AI via its OpenAI-compatible API endpoint. Set your Fireworks AI API key as an environment variable. Ensure LiteLLM is installed and ready to use for lightweight model orchestration.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to create a Fireworks AI client using the OpenAI SDK with the Fireworks API base URL, then send a chat completion request. LiteLLM can wrap this client for lightweight inference orchestration.

python

import os
from openai import OpenAI

# Initialize Fireworks AI client with OpenAI-compatible SDK
client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

# Define chat messages
messages = [
    {"role": "user", "content": "Hello, how can Fireworks AI help me today?"}
]

# Call chat completion with Fireworks AI model
response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=messages
)

# Extract and print the assistant's reply
print(response.choices[0].message.content)

output

Hello! I can assist you with a wide range of tasks including answering questions, generating text, and more. How can I help you today?

Common variations

You can use async calls with the OpenAI SDK for Fireworks AI by using async and await. To enable streaming responses, set stream=True in chat.completions.create. You can also switch models by changing the model parameter to other Fireworks AI models.

python

import asyncio
import os
from openai import OpenAI

async def async_chat():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    messages = [{"role": "user", "content": "Stream a response from Fireworks AI."}]

    # Async streaming call
    stream = await client.chat.completions.create(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=messages,
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

asyncio.run(async_chat())

output

Streaming response text from Fireworks AI displayed token by token in real time...

Troubleshooting

If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
If the model is not found, confirm you are using a valid Fireworks AI model name starting with accounts/fireworks/models/.
For network errors, check your internet connection and that https://api.fireworks.ai/inference/v1 is reachable.

✅

Key Takeaways

Use the OpenAI SDK with base_url set to Fireworks AI's API endpoint for integration.
Always load your Fireworks AI API key from environment variables for security.
LiteLLM can orchestrate calls to Fireworks AI by wrapping the OpenAI-compatible client.
Use async and streaming options in the SDK for responsive applications.
Verify model names and network connectivity to avoid common errors.

Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct

Verify ↗