Fireworks AI LoRA fine-tuning
Quick answer
Fireworks AI supports LoRA fine-tuning via its OpenAI-compatible API by uploading a training file and creating a fine-tuning job using the fine_tuning.jobs.create endpoint. Use the OpenAI SDK with your Fireworks API key and specify the LoRA fine-tuned model for inference.
PREREQUISITES
Python 3.8+Fireworks AI API keypip install openai>=1.0
Setup
Install the official openai Python package (v1+) and set your Fireworks AI API key as an environment variable. Use the Fireworks AI OpenAI-compatible endpoint with the OpenAI client.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to upload a LoRA fine-tuning training file, create a fine-tuning job on Fireworks AI, monitor the job status, and then use the fine-tuned model for chat completions.
import os
import time
from openai import OpenAI
# Initialize client with Fireworks AI API key and base URL
client = OpenAI(
api_key=os.environ["FIREWORKS_API_KEY"],
base_url="https://api.fireworks.ai/inference/v1"
)
# Step 1: Upload training file (JSONL format with messages)
with open("lora_training.jsonl", "rb") as f:
training_file = client.files.create(file=f, purpose="fine-tune")
print(f"Uploaded training file ID: {training_file.id}")
# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
training_file=training_file.id,
model="accounts/fireworks/models/llama-v3p3-70b-instruct"
# Optionally specify LoRA config in training data or metadata
)
print(f"Created fine-tuning job ID: {job.id}")
# Step 3: Poll job status until done
while True:
status = client.fine_tuning.jobs.retrieve(job.id)
print(f"Job status: {status.status}")
if status.status in ["succeeded", "failed"]:
break
time.sleep(10)
if status.status == "succeeded":
fine_tuned_model = status.fine_tuned_model
print(f"Fine-tuned model ready: {fine_tuned_model}")
# Step 4: Use fine-tuned model for chat
response = client.chat.completions.create(
model=fine_tuned_model,
messages=[{"role": "user", "content": "Explain LoRA fine-tuning."}]
)
print("Response:", response.choices[0].message.content)
else:
print("Fine-tuning job failed.") output
Uploaded training file ID: file-abc123 Created fine-tuning job ID: job-xyz789 Job status: running Job status: running Job status: succeeded Fine-tuned model ready: accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001 Response: LoRA fine-tuning adapts large models efficiently by training low-rank adapters, reducing compute and memory costs.
Common variations
- Use async calls with
asyncioandawaitfor non-blocking fine-tuning job polling. - Change the base URL or model to other Fireworks AI models supporting LoRA.
- Incorporate streaming chat completions by setting
stream=Trueinchat.completions.create.
import asyncio
from openai import OpenAI
async def async_fine_tune():
client = OpenAI(
api_key=os.environ["FIREWORKS_API_KEY"],
base_url="https://api.fireworks.ai/inference/v1"
)
# Upload file asynchronously is not supported directly, so do sync upload first
with open("lora_training.jsonl", "rb") as f:
training_file = client.files.create(file=f, purpose="fine-tune")
job = client.fine_tuning.jobs.create(
training_file=training_file.id,
model="accounts/fireworks/models/llama-v3p3-70b-instruct"
)
while True:
status = client.fine_tuning.jobs.retrieve(job.id)
print(f"Job status: {status.status}")
if status.status in ["succeeded", "failed"]:
break
await asyncio.sleep(10)
if status.status == "succeeded":
fine_tuned_model = status.fine_tuned_model
print(f"Fine-tuned model ready: {fine_tuned_model}")
# Streaming chat example
stream = client.chat.completions.create(
model=fine_tuned_model,
messages=[{"role": "user", "content": "Tell me about LoRA."}],
stream=True
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)
asyncio.run(async_fine_tune()) output
Job status: running Job status: running Job status: succeeded Fine-tuned model ready: accounts/fireworks/models/llama-v3p3-70b-instruct-lora-001 LoRA fine-tuning adapts large models efficiently by training low-rank adapters, reducing compute and memory costs.
Troubleshooting
- If you get authentication errors, verify your
FIREWORKS_API_KEYenvironment variable is set correctly. - For file upload failures, ensure your training file is valid JSONL with proper message format.
- If fine-tuning job fails, check the job logs or metadata for errors in training data or model compatibility.
- Timeouts during polling can be handled by increasing sleep intervals or using async polling.
Key Takeaways
- Use the OpenAI SDK with Fireworks AI base_url for LoRA fine-tuning workflows.
- Upload training data in JSONL format and create fine-tuning jobs via fine_tuning.jobs.create.
- Poll job status until completion before using the fine-tuned model for inference.
- Async and streaming completions improve responsiveness in production applications.
- Validate API keys and training data format to avoid common errors.