How to beginner · 3 min read

How to use Fireworks AI with OpenAI SDK

Quick answer
Use the OpenAI Python SDK with the base_url set to Fireworks AI's API endpoint and your Fireworks API key. Call client.chat.completions.create with the Fireworks model name like accounts/fireworks/models/llama-v3p3-70b-instruct to generate completions.

PREREQUISITES

  • Python 3.8+
  • Fireworks AI API key
  • pip install openai>=1.0

Setup

Install the official openai Python package (v1 or later) and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint as the base_url.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to create a chat completion request using Fireworks AI with the OpenAI SDK. Replace FIREWORKS_API_KEY with your actual API key set in the environment.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello, Fireworks AI!"}]
)

print(response.choices[0].message.content)
output
Hello, Fireworks AI! How can I assist you today?

Common variations

  • Use other Fireworks AI models by changing the model parameter, e.g., accounts/fireworks/models/deepseek-r1.
  • For asynchronous calls, use an async client pattern with asyncio and await.
  • Enable streaming by passing stream=True to chat.completions.create and iterating over the response.
python
import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    stream = await client.chat.completions.create(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=[{"role": "user", "content": "Stream a response."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())
output
Streaming response text here...

Troubleshooting

  • If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
  • Ensure the base_url is exactly https://api.fireworks.ai/inference/v1.
  • If the model is not found, confirm you are using a valid Fireworks AI model name starting with accounts/fireworks/models/.

Key Takeaways

  • Use the OpenAI SDK with Fireworks AI by setting the base_url to Fireworks endpoint.
  • Specify Fireworks model names fully, e.g., accounts/fireworks/models/llama-v3p3-70b-instruct.
  • Set your Fireworks API key in the environment variable FIREWORKS_API_KEY.
  • Streaming and async calls are supported with the OpenAI SDK pattern.
  • Check model names and API key if you encounter errors.
Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct, accounts/fireworks/models/deepseek-r1
Verify ↗