How to beginner · 3 min read

How to use Fireworks AI with OpenAI SDK

Quick answer

Use the OpenAI Python SDK with the base_url set to Fireworks AI's API endpoint and your Fireworks API key. Call client.chat.completions.create with the Fireworks model name like accounts/fireworks/models/llama-v3p3-70b-instruct to generate completions.

PREREQUISITES

Python 3.8+
Fireworks AI API key
pip install openai>=1.0

Setup

Install the official openai Python package (v1 or later) and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint as the base_url.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to create a chat completion request using Fireworks AI with the OpenAI SDK. Replace FIREWORKS_API_KEY with your actual API key set in the environment.

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello, Fireworks AI!"}]
)

print(response.choices[0].message.content)

output

Hello, Fireworks AI! How can I assist you today?

Common variations

Use other Fireworks AI models by changing the model parameter, e.g., accounts/fireworks/models/deepseek-r1.
For asynchronous calls, use an async client pattern with asyncio and await.
Enable streaming by passing stream=True to chat.completions.create and iterating over the response.

python

import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    stream = await client.chat.completions.create(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=[{"role": "user", "content": "Stream a response."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

output

Streaming response text here...

Troubleshooting

If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
Ensure the base_url is exactly https://api.fireworks.ai/inference/v1.
If the model is not found, confirm you are using a valid Fireworks AI model name starting with accounts/fireworks/models/.

✅

Key Takeaways

Use the OpenAI SDK with Fireworks AI by setting the base_url to Fireworks endpoint.
Specify Fireworks model names fully, e.g., accounts/fireworks/models/llama-v3p3-70b-instruct.
Set your Fireworks API key in the environment variable FIREWORKS_API_KEY.
Streaming and async calls are supported with the OpenAI SDK pattern.
Check model names and API key if you encounter errors.

Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct, accounts/fireworks/models/deepseek-r1

Verify ↗