How to beginner · 3 min read

How to use Llama via Fireworks API

Quick answer
Use the openai Python SDK with base_url="https://api.fireworks.ai/inference/v1" and your Fireworks API key to call Llama models like accounts/fireworks/models/llama-v3p3-70b-instruct. Create chat completions with client.chat.completions.create() passing your messages to get Llama responses.

PREREQUISITES

  • Python 3.8+
  • Fireworks API key
  • pip install openai>=1.0

Setup

Install the openai Python package and set your Fireworks API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export OPENAI_API_KEY="your_fireworks_api_key" (Linux/macOS) or setx OPENAI_API_KEY "your_fireworks_api_key" (Windows)
bash
pip install openai

Step by step

This example shows how to call the Fireworks Llama model accounts/fireworks/models/llama-v3p3-70b-instruct using the OpenAI-compatible SDK.

python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

messages = [
    {"role": "user", "content": "Explain the benefits of using Llama models."}
]

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=messages
)

print(response.choices[0].message.content)
output
Llama models offer efficient and powerful language understanding capabilities, enabling advanced natural language processing tasks with high accuracy and scalability.

Common variations

You can use different Llama variants by changing the model parameter. For streaming responses, use the stream=True parameter in client.chat.completions.create(). Async usage requires an async HTTP client wrapper around the OpenAI SDK.

python
response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=messages,
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")
output
Llama models offer efficient and powerful language understanding capabilities, enabling advanced natural language processing tasks with high accuracy and scalability.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • If the model is not found, confirm the model ID accounts/fireworks/models/llama-v3p3-70b-instruct is correct and available in your Fireworks account.
  • For network issues, check your internet connection and firewall settings.

Key Takeaways

  • Use the OpenAI Python SDK with Fireworks API by setting base_url to Fireworks endpoint.
  • Specify the full Fireworks Llama model ID in the model parameter for chat completions.
  • Set your Fireworks API key in the OPENAI_API_KEY environment variable for authentication.
Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct
Verify ↗