How to beginner · 3 min read

Fireworks AI batch inference

Quick answer
Use the OpenAI-compatible Python SDK with Fireworks AI by setting the base_url to Fireworks' endpoint and your API key in api_key. Send a list of messages in the messages parameter to the chat.completions.create method for batch inference.

PREREQUISITES

  • Python 3.8+
  • Fireworks AI API key
  • pip install openai>=1.0

Setup

Install the openai Python package (v1+) and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint for API calls.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to perform batch inference by sending multiple user messages in a single API call to Fireworks AI using the OpenAI-compatible SDK.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

# Prepare batch messages as a list of dicts
batch_messages = [
    [{"role": "user", "content": "Hello, how are you?"}],
    [{"role": "user", "content": "Explain RAG in AI."}],
    [{"role": "user", "content": "Write a Python function to add two numbers."}]
]

# Fireworks AI expects one chat completion per message list, so loop over batch
for i, messages in enumerate(batch_messages, 1):
    response = client.chat.completions.create(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=messages
    )
    print(f"Response {i}:")
    print(response.choices[0].message.content)
    print("---")
output
Response 1:
Hello! I'm doing well, thank you for asking. How can I assist you today?
---
Response 2:
RAG (Retrieval-Augmented Generation) is a technique that combines retrieval of relevant documents with generative models to produce more accurate and context-aware responses.
---
Response 3:
def add_numbers(a, b):
    return a + b
---

Common variations

You can perform asynchronous batch inference by using an async HTTP client with the OpenAI-compatible API or switch models by changing the model parameter. Fireworks AI models always start with accounts/fireworks/models/.

python
import asyncio
import os
from openai import OpenAI

async def async_batch_inference():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    batch_messages = [
        [{"role": "user", "content": "Summarize AI trends."}],
        [{"role": "user", "content": "Generate a haiku about spring."}]
    ]

    for i, messages in enumerate(batch_messages, 1):
        response = await client.chat.completions.create(
            model="accounts/fireworks/models/llama-v3p3-70b-instruct",
            messages=messages
        )
        print(f"Async response {i}:")
        print(response.choices[0].message.content)
        print("---")

asyncio.run(async_batch_inference())
output
Async response 1:
AI trends include advances in large language models, multimodal AI, and increased adoption of generative AI in industry.
---
Async response 2:
Gentle spring breezes
Whisper through blooming flowers
Nature's soft embrace
---

Troubleshooting

  • If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
  • For model not found errors, confirm the model name starts with accounts/fireworks/models/.
  • If responses are empty, check your message format matches the expected chat message list.

Key Takeaways

  • Use the OpenAI-compatible openai SDK with Fireworks AI by setting base_url to Fireworks endpoint.
  • Batch inference is done by looping over multiple message lists and calling chat.completions.create for each.
  • Fireworks AI model names always start with accounts/fireworks/models/.
  • Async batch calls are supported by awaiting chat.completions.create in an async context.
  • Always set your API key in the FIREWORKS_API_KEY environment variable for secure authentication.
Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct
Verify ↗