How to beginner · 3 min read

How to use Llama on Fireworks AI

Q: How to use Llama on Fireworks AI

Use the openai Python SDK with your Fireworks AI API key and set the base_url to Fireworks' endpoint. Call client.chat.completions.create with the Llama model name accounts/fireworks/models/llama-v3p3-70b-instruct and your chat messages to get completions.

Quick answer

Use the openai Python SDK with your Fireworks AI API key and set the base_url to Fireworks' endpoint. Call client.chat.completions.create with the Llama model name accounts/fireworks/models/llama-v3p3-70b-instruct and your chat messages to get completions.

PREREQUISITES

Python 3.8+
Fireworks AI API key
pip install openai>=1.0

Setup

Install the openai Python package and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint for API calls.

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK with your Fireworks API key and specify the Fireworks base URL. Call the chat.completions.create method with the Llama model and your messages.

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello, how are you?"}]
)

print(response.choices[0].message.content)

output

Hello! I'm your Llama model on Fireworks AI, ready to assist you.

Common variations

You can use other Fireworks Llama models by changing the model parameter. For asynchronous calls, use an async client pattern. Streaming responses are not currently supported on Fireworks AI.

python

import asyncio
from openai import OpenAI

async def main():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )
    response = await client.chat.completions.acreate(
        model="accounts/fireworks/models/llama-v3p3-70b-instruct",
        messages=[{"role": "user", "content": "Tell me a joke."}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

output

Why did the computer show up at work late? Because it had a hard drive!

Troubleshooting

If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
If the model is not found, confirm you are using the exact model name accounts/fireworks/models/llama-v3p3-70b-instruct.
For network issues, check your internet connection and Fireworks API status.

✅

Key Takeaways

Use the OpenAI SDK with Fireworks AI by setting the base_url to Fireworks endpoint.
Specify the full Fireworks Llama model name when calling chat completions.
Set your Fireworks API key in the environment variable FIREWORKS_API_KEY.
Async calls are supported; streaming is not currently available.
Verify model names and API keys to avoid common errors.

Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct

Verify ↗