How to Intermediate · 3 min read

How to do RLHF with OpenAI

Quick answer

Use OpenAI's fine-tuning API to perform RLHF by preparing a dataset of human-labeled prompts and completions, uploading it via client.files.create, then creating a fine-tuning job with client.fine_tuning.jobs.create. After training, query the fine-tuned model with client.chat.completions.create for improved responses.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Prepare your RLHF training data as a JSONL file with messages arrays containing system, user, and assistant roles. Upload the file, create a fine-tuning job, wait for completion, then query the fine-tuned model.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Step 1: Upload training file (JSONL format with RLHF data)
with open("rlhf_training.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")

print(f"Uploaded file ID: {training_file.id}")

# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini-2024-07-18"
)

print(f"Fine-tuning job ID: {job.id}")

# Step 3: Poll job status until done (simplified example)
import time
while True:
    status = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Status: {status.status}")
    if status.status in ["succeeded", "failed"]:
        break
    time.sleep(30)

if status.status == "succeeded":
    fine_tuned_model = status.fine_tuned_model
    print(f"Fine-tuned model ready: {fine_tuned_model}")

    # Step 4: Query fine-tuned model
    response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[{"role": "user", "content": "Explain RLHF."}]
    )
    print("Response:", response.choices[0].message.content)
else:
    print("Fine-tuning failed.")

output

Uploaded file ID: file-abc123xyz
Fine-tuning job ID: job-xyz789abc
Status: running
Status: running
Status: succeeded
Fine-tuned model ready: gpt-4o-mini-2024-07-18-ft-abc123
Response: Reinforcement Learning with Human Feedback (RLHF) improves model behavior by training it on human-labeled examples and feedback, enhancing alignment and quality.

Common variations

You can use asynchronous calls with asyncio for polling, change the base model to gpt-4o or others, and customize training parameters like n_epochs or batch_size. Streaming output is not applicable for fine-tuning but can be used when querying the fine-tuned model.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def poll_job(job_id):
    while True:
        status = client.fine_tuning.jobs.retrieve(job_id)
        print(f"Status: {status.status}")
        if status.status in ["succeeded", "failed"]:
            return status
        await asyncio.sleep(30)

async def main():
    # Assume training_file.id is known
    job = client.fine_tuning.jobs.create(
        training_file="file-abc123xyz",
        model="gpt-4o"
    )
    print(f"Job ID: {job.id}")

    status = await poll_job(job.id)
    if status.status == "succeeded":
        response = client.chat.completions.create(
            model=status.fine_tuned_model,
            messages=[{"role": "user", "content": "What is RLHF?"}],
            stream=True
        )
        for chunk in response:
            print(chunk.choices[0].delta.content or "", end="", flush=True)
    else:
        print("Fine-tuning failed.")

if __name__ == "__main__":
    asyncio.run(main())

output

Job ID: job-xyz789abc
Status: running
Status: running
Status: succeeded
Reinforcement Learning with Human Feedback (RLHF) improves model behavior by training it on human-labeled examples and feedback, enhancing alignment and quality.

Troubleshooting

If you see Invalid file format, ensure your training data is valid JSONL with proper messages arrays including system, user, and assistant roles.
If the fine-tuning job fails, check the status and error fields from client.fine_tuning.jobs.retrieve for details.
API rate limits can cause errors; implement retries with exponential backoff.

✅

Key Takeaways

Prepare RLHF data as JSONL with system, user, and assistant messages for fine-tuning.
Use client.files.create to upload data and client.fine_tuning.jobs.create to start training.
Poll the fine-tuning job status until completion before querying the fine-tuned model.
Customize training parameters and use async polling or streaming when querying the fine-tuned model.
Validate data format and monitor job errors to troubleshoot fine-tuning issues.

Verified 2026-04 · gpt-4o-mini-2024-07-18, gpt-4o

Verify ↗