How to beginner · 3 min read

How to use Llama via Fireworks API

Q: How to use Llama via Fireworks API

Use the openai Python SDK with base_url="https://api.fireworks.ai/inference/v1" and your Fireworks API key to call Llama models like accounts/fireworks/models/llama-v3p3-70b-instruct. Create chat completions with client.chat.completions.create() passing your messages to get Llama responses.

Quick answer

Use the openai Python SDK with base_url="https://api.fireworks.ai/inference/v1" and your Fireworks API key to call Llama models like accounts/fireworks/models/llama-v3p3-70b-instruct. Create chat completions with client.chat.completions.create() passing your messages to get Llama responses.

PREREQUISITES

Python 3.8+
Fireworks API key
pip install openai>=1.0

Setup

Install the openai Python package and set your Fireworks API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export OPENAI_API_KEY="your_fireworks_api_key" (Linux/macOS) or setx OPENAI_API_KEY "your_fireworks_api_key" (Windows)

bash

pip install openai

Step by step

This example shows how to call the Fireworks Llama model accounts/fireworks/models/llama-v3p3-70b-instruct using the OpenAI-compatible SDK.

python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

messages = [
    {"role": "user", "content": "Explain the benefits of using Llama models."}
]

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=messages
)

print(response.choices[0].message.content)

output

Llama models offer efficient and powerful language understanding capabilities, enabling advanced natural language processing tasks with high accuracy and scalability.

Common variations

You can use different Llama variants by changing the model parameter. For streaming responses, use the stream=True parameter in client.chat.completions.create(). Async usage requires an async HTTP client wrapper around the OpenAI SDK.

python

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

output

Llama models offer efficient and powerful language understanding capabilities, enabling advanced natural language processing tasks with high accuracy and scalability.

Troubleshooting

If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
If the model is not found, confirm the model ID accounts/fireworks/models/llama-v3p3-70b-instruct is correct and available in your Fireworks account.
For network issues, check your internet connection and firewall settings.

✅

Key Takeaways

Use the OpenAI Python SDK with Fireworks API by setting base_url to Fireworks endpoint.
Specify the full Fireworks Llama model ID in the model parameter for chat completions.
Set your Fireworks API key in the OPENAI_API_KEY environment variable for authentication.

Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct

Verify ↗