How to beginner · 3 min read

Fireworks AI pricing

Quick answer

Fireworks AI pricing is usage-based and typically charged per token processed via their API. You can access Fireworks AI models using the OpenAI SDK with your API key, but exact pricing details should be checked on the official Fireworks AI website as they may vary. There is no publicly documented free tier, so plan accordingly for API usage costs.

PREREQUISITES

Python 3.8+
Fireworks AI API key
pip install openai>=1.0

Setup

Install the openai Python package to interact with Fireworks AI's OpenAI-compatible API. Set your Fireworks AI API key as an environment variable for secure authentication.

Install package: pip install openai
Set environment variable: export FIREWORKS_API_KEY='your_api_key' (Linux/macOS) or set FIREWORKS_API_KEY=your_api_key (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK with the Fireworks AI base URL to call the API. Replace os.environ["FIREWORKS_API_KEY"] with your environment variable. This example sends a chat completion request to the Fireworks AI Llama 3.3 70B Instruct model.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["FIREWORKS_API_KEY"],
                base_url="https://api.fireworks.ai/inference/v1")

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello, Fireworks AI pricing details?"}]
)

print(response.choices[0].message.content)

output

Fireworks AI pricing is usage-based, charged per token processed. For exact rates, visit https://fireworks.ai/pricing.

Common variations

You can switch models by changing the model parameter to other Fireworks AI models like accounts/fireworks/models/deepseek-r1. For asynchronous calls, use Python's asyncio with the OpenAI SDK's async client. Streaming responses are also supported by setting stream=True in the request.

python

import os
import asyncio
from openai import OpenAI

async def async_chat():
    client = OpenAI(api_key=os.environ["FIREWORKS_API_KEY"],
                    base_url="https://api.fireworks.ai/inference/v1")
    
    stream = await client.chat.completions.create(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=[{"role": "user", "content": "Stream Fireworks AI pricing info."}],
        stream=True
    )

    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_chat())

output

Fireworks AI pricing is usage-based, charged per token processed. For exact rates, visit https://fireworks.ai/pricing.

Troubleshooting

If you receive authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
For HTTP 429 rate limit errors, reduce request frequency or check your Fireworks AI plan limits.
If the model is not found, confirm the model name matches Fireworks AI's current offerings.

Key Takeaways

Fireworks AI pricing is usage-based and charged per token processed via API calls.
Use the OpenAI SDK with the Fireworks AI base URL and your API key for integration.
Check Fireworks AI's official pricing page regularly as rates and models may change.

Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct, accounts/fireworks/models/deepseek-r1

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.