How to intermediate · 3 min read

Fireworks AI LoRA fine-tuning

Quick answer
Fireworks AI supports LoRA fine-tuning via its OpenAI-compatible API by uploading a training file and creating a fine-tuning job using the fine_tuning.jobs.create endpoint. Use the OpenAI SDK with your Fireworks API key and specify the LoRA fine-tuned model for inference.

PREREQUISITES

  • Python 3.8+
  • Fireworks AI API key
  • pip install openai>=1.0

Setup

Install the official openai Python package (v1+) and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint with the OpenAI client.

bash
pip install openai>=1.0
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to upload a LoRA fine-tuning training file, create a fine-tuning job on Fireworks AI, monitor the job status, and then use the fine-tuned model for chat completions.

python
import os
import time
from openai import OpenAI

# Initialize client with Fireworks AI API key and base URL
client = OpenAI(
    api_key=os.environ["FIREWORKS_API_KEY"],
    base_url="https://api.fireworks.ai/inference/v1"
)

# Step 1: Upload training file (JSONL format with messages)
with open("lora_training.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")
print(f"Uploaded training file ID: {training_file.id}")

# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="accounts/fireworks/models/llama-v3p3-70b-instruct"
    # Optionally specify LoRA config in training data or metadata
)
print(f"Created fine-tuning job ID: {job.id}")

# Step 3: Poll job status until done
while True:
    status = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Job status: {status.status}")
    if status.status in ["succeeded", "failed"]:
        break
    time.sleep(10)

if status.status == "succeeded":
    fine_tuned_model = status.fine_tuned_model
    print(f"Fine-tuned model ready: {fine_tuned_model}")

    # Step 4: Use fine-tuned model for chat
    response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[{"role": "user", "content": "Explain LoRA fine-tuning."}]
    )
    print("Response:", response.choices[0].message.content)
else:
    print("Fine-tuning job failed.")
output
Uploaded training file ID: file-abc123
Created fine-tuning job ID: job-xyz789
Job status: running
Job status: running
Job status: succeeded
Fine-tuned model ready: accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001
Response: LoRA fine-tuning adapts large models efficiently by training low-rank adapters, reducing compute and memory costs.

Common variations

  • Use async calls with asyncio and await for non-blocking fine-tuning job polling.
  • Change the base URL or model to other Fireworks AI models supporting LoRA.
  • Incorporate streaming chat completions by setting stream=True in chat.completions.create.
python
import asyncio
from openai import OpenAI

async def async_fine_tune():
    client = OpenAI(
        api_key=os.environ["FIREWORKS_API_KEY"],
        base_url="https://api.fireworks.ai/inference/v1"
    )

    # Upload file asynchronously is not supported directly, so do sync upload first
    with open("lora_training.jsonl", "rb") as f:
        training_file = client.files.create(file=f, purpose="fine-tune")

    job = client.fine_tuning.jobs.create(
        training_file=training_file.id,
        model="accounts/fireworks/models/llama-v3p3-70b-instruct"
    )

    while True:
        status = client.fine_tuning.jobs.retrieve(job.id)
        print(f"Job status: {status.status}")
        if status.status in ["succeeded", "failed"]:
            break
        await asyncio.sleep(10)

    if status.status == "succeeded":
        fine_tuned_model = status.fine_tuned_model
        print(f"Fine-tuned model ready: {fine_tuned_model}")

        # Streaming chat example
        stream = client.chat.completions.create(
            model=fine_tuned_model,
            messages=[{"role": "user", "content": "Tell me about LoRA."}],
            stream=True
        )
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_fine_tune())
output
Job status: running
Job status: running
Job status: succeeded
Fine-tuned model ready: accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001
LoRA fine-tuning adapts large models efficiently by training low-rank adapters, reducing compute and memory costs.

Troubleshooting

  • If you get authentication errors, verify your FIREWORKS_API_KEY environment variable is set correctly.
  • For file upload failures, ensure your training file is valid JSONL with proper message format.
  • If fine-tuning job fails, check the job logs or metadata for errors in training data or model compatibility.
  • Timeouts during polling can be handled by increasing sleep intervals or using async polling.

Key Takeaways

  • Use the OpenAI SDK with Fireworks AI base_url for LoRA fine-tuning workflows.
  • Upload training data in JSONL format and create fine-tuning jobs via fine_tuning.jobs.create.
  • Poll job status until completion before using the fine-tuned model for inference.
  • Async and streaming completions improve responsiveness in production applications.
  • Validate API keys and training data format to avoid common errors.
Verified 2026-04 · accounts/fireworks/models/llama-v3p3-70b-instruct, accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001
Verify ↗