How to intermediate · 3 min read

Fireworks AI LoRA fine-tuning

Quick answer

Fireworks AI supports LoRA fine-tuning via its OpenAI-compatible API by uploading a training file and creating a fine-tuning job using the fine_tuning.jobs.create endpoint. Use the OpenAI SDK with your Fireworks API key and specify the LoRA fine-tuned model for inference.

PREREQUISITES

Python 3.8+
Fireworks AI API key
pip install openai>=1.0

Setup

Install the official openai Python package (v1+) and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint with the OpenAI client.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to upload a LoRA fine-tuning training file, create a fine-tuning job on Fireworks AI, monitor the job status, and then use the fine-tuned model for chat completions.

python

import os
import time
from openai import OpenAI

# Initialize client with Fireworks AI API key and base URL
client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

# Step 1: Upload training file (JSONL format with messages)
with open("lora_training.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")
print(f"Uploaded training file ID: {training_file.id}")

# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="accounts/fireworks/models/llama-v3p3-70b-instruct"
    # Optionally specify LoRA config in training data or metadata
)
print(f"Created fine-tuning job ID: {job.id}")

# Step 3: Poll job status until done
while True:
    status = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Job status: {status.status}")
    if status.status in ["succeeded", "failed"]:
        break
    time.sleep(10)

if status.status == "succeeded":
    fine_tuned_model = status.fine_tuned_model
    print(f"Fine-tuned model ready: {fine_tuned_model}")

    # Step 4: Use fine-tuned model for chat
    response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[{"role": "user", "content": "Explain LoRA fine-tuning."}]
    )
    print("Response:", response.choices[0].message.content)
else:
    print("Fine-tuning job failed.")

output

Uploaded training file ID: file-abc123
Created fine-tuning job ID: job-xyz789
Job status: running
Job status: running
Job status: succeeded
Fine-tuned model ready: accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001
Response: LoRA fine-tuning adapts large models efficiently by training low-rank adapters, reducing compute and memory costs.

Common variations

Use async calls with asyncio and await for non-blocking fine-tuning job polling.
Change the base URL or model to other Fireworks AI models supporting LoRA.
Incorporate streaming chat completions by setting stream=True in chat.completions.create.

python

import asyncio
from openai import OpenAI

async def async_fine_tune():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    # Upload file asynchronously is not supported directly, so do sync upload first
    with open("lora_training.jsonl", "rb") as f:
        training_file = client.files.create(file=f, purpose="fine-tune")

    job = client.fine_tuning.jobs.create(
        training_file=training_file.id,
        model="accounts/fireworks/models/llama-v3p3-70b-instruct"
    )

    while True:
        status = client.fine_tuning.jobs.retrieve(job.id)
        print(f"Job status: {status.status}")
        if status.status in ["succeeded", "failed"]:
            break
        await asyncio.sleep(10)

    if status.status == "succeeded":
        fine_tuned_model = status.fine_tuned_model
        print(f"Fine-tuned model ready: {fine_tuned_model}")

        # Streaming chat example
        stream = client.chat.completions.create(
            model=fine_tuned_model,
            messages=[{"role": "user", "content": "Tell me about LoRA."}],
            stream=True
        )
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_fine_tune())

output

Job status: running
Job status: running
Job status: succeeded
Fine-tuned model ready: accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001
LoRA fine-tuning adapts large models efficiently by training low-rank adapters, reducing compute and memory costs.

Troubleshooting

If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
For file upload failures, ensure your training file is valid JSONL with proper message format.
If fine-tuning job fails, check the job logs or metadata for errors in training data or model compatibility.
Timeouts during polling can be handled by increasing sleep intervals or using async polling.

✅

Key Takeaways

Use the OpenAI SDK with Fireworks AI base_url for LoRA fine-tuning workflows.
Upload training data in JSONL format and create fine-tuning jobs via fine_tuning.jobs.create.
Poll job status until completion before using the fine-tuned model for inference.
Async and streaming completions improve responsiveness in production applications.
Validate API keys and training data format to avoid common errors.

Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct, accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001

Verify ↗