How to beginner · 3 min read

How to use Fireworks AI with LiteLLM

Quick answer
Use the openai Python SDK with base_url set to Fireworks AI's endpoint and your API key from os.environ. Instantiate the OpenAI client, then call chat.completions.create with the Fireworks AI model and messages. LiteLLM can invoke this client for lightweight local or remote inference.

PREREQUISITES

  • Python 3.8+
  • Fireworks AI API key
  • pip install openai>=1.0
  • LiteLLM installed and configured

Setup

Install the openai Python package (v1+) to access Fireworks AI via its OpenAI-compatible API endpoint. Set your Fireworks AI API key as an environment variable. Ensure LiteLLM is installed and ready to use for lightweight model orchestration.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to create a Fireworks AI client using the OpenAI SDK with the Fireworks API base URL, then send a chat completion request. LiteLLM can wrap this client for lightweight inference orchestration.

python
import os
from openai import OpenAI

# Initialize Fireworks AI client with OpenAI-compatible SDK
client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

# Define chat messages
messages = [
    {"role": "user", "content": "Hello, how can Fireworks AI help me today?"}
]

# Call chat completion with Fireworks AI model
response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=messages
)

# Extract and print the assistant's reply
print(response.choices[0].message.content)
output
Hello! I can assist you with a wide range of tasks including answering questions, generating text, and more. How can I help you today?

Common variations

You can use async calls with the OpenAI SDK for Fireworks AI by using async and await. To enable streaming responses, set stream=True in chat.completions.create. You can also switch models by changing the model parameter to other Fireworks AI models.

python
import asyncio
import os
from openai import OpenAI

async def async_chat():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    messages = [{"role": "user", "content": "Stream a response from Fireworks AI."}]

    # Async streaming call
    stream = await client.chat.completions.create(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=messages,
        stream=True
    )

    async for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)

asyncio.run(async_chat())
output
Streaming response text from Fireworks AI displayed token by token in real time...

Troubleshooting

  • If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
  • If the model is not found, confirm you are using a valid Fireworks AI model name starting with accounts/fireworks/models/.
  • For network errors, check your internet connection and that https://api.fireworks.ai/inference/v1 is reachable.

Key Takeaways

  • Use the OpenAI SDK with base_url set to Fireworks AI's API endpoint for integration.
  • Always load your Fireworks AI API key from environment variables for security.
  • LiteLLM can orchestrate calls to Fireworks AI by wrapping the OpenAI-compatible client.
  • Use async and streaming options in the SDK for responsive applications.
  • Verify model names and network connectivity to avoid common errors.
Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct
Verify ↗