How to Intermediate · 3 min read

How to do RLHF with OpenAI

Quick answer
Use OpenAI's fine-tuning API to perform RLHF by preparing a dataset of human-labeled prompts and completions, uploading it via client.files.create, then creating a fine-tuning job with client.fine_tuning.jobs.create. After training, query the fine-tuned model with client.chat.completions.create for improved responses.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the official OpenAI Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Prepare your RLHF training data as a JSONL file with messages arrays containing system, user, and assistant roles. Upload the file, create a fine-tuning job, wait for completion, then query the fine-tuned model.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Step 1: Upload training file (JSONL format with RLHF data)
with open("rlhf_training.jsonl", "rb") as f:
    training_file = client.files.create(file=f, purpose="fine-tune")

print(f"Uploaded file ID: {training_file.id}")

# Step 2: Create fine-tuning job
job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    model="gpt-4o-mini-2024-07-18"
)

print(f"Fine-tuning job ID: {job.id}")

# Step 3: Poll job status until done (simplified example)
import time
while True:
    status = client.fine_tuning.jobs.retrieve(job.id)
    print(f"Status: {status.status}")
    if status.status in ["succeeded", "failed"]:
        break
    time.sleep(30)

if status.status == "succeeded":
    fine_tuned_model = status.fine_tuned_model
    print(f"Fine-tuned model ready: {fine_tuned_model}")

    # Step 4: Query fine-tuned model
    response = client.chat.completions.create(
        model=fine_tuned_model,
        messages=[{"role": "user", "content": "Explain RLHF."}]
    )
    print("Response:", response.choices[0].message.content)
else:
    print("Fine-tuning failed.")
output
Uploaded file ID: file-abc123xyz
Fine-tuning job ID: job-xyz789abc
Status: running
Status: running
Status: succeeded
Fine-tuned model ready: gpt-4o-mini-2024-07-18-ft-abc123
Response: Reinforcement Learning with Human Feedback (RLHF) improves model behavior by training it on human-labeled examples and feedback, enhancing alignment and quality.

Common variations

You can use asynchronous calls with asyncio for polling, change the base model to gpt-4o or others, and customize training parameters like n_epochs or batch_size. Streaming output is not applicable for fine-tuning but can be used when querying the fine-tuned model.

python
import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def poll_job(job_id):
    while True:
        status = client.fine_tuning.jobs.retrieve(job_id)
        print(f"Status: {status.status}")
        if status.status in ["succeeded", "failed"]:
            return status
        await asyncio.sleep(30)

async def main():
    # Assume training_file.id is known
    job = client.fine_tuning.jobs.create(
        training_file="file-abc123xyz",
        model="gpt-4o"
    )
    print(f"Job ID: {job.id}")

    status = await poll_job(job.id)
    if status.status == "succeeded":
        response = client.chat.completions.create(
            model=status.fine_tuned_model,
            messages=[{"role": "user", "content": "What is RLHF?"}],
            stream=True
        )
        for chunk in response:
            print(chunk.choices[0].delta.content or "", end="", flush=True)
    else:
        print("Fine-tuning failed.")

if __name__ == "__main__":
    asyncio.run(main())
output
Job ID: job-xyz789abc
Status: running
Status: running
Status: succeeded
Reinforcement Learning with Human Feedback (RLHF) improves model behavior by training it on human-labeled examples and feedback, enhancing alignment and quality.

Troubleshooting

  • If you see Invalid file format, ensure your training data is valid JSONL with proper messages arrays including system, user, and assistant roles.
  • If the fine-tuning job fails, check the status and error fields from client.fine_tuning.jobs.retrieve for details.
  • API rate limits can cause errors; implement retries with exponential backoff.

Key Takeaways

  • Prepare RLHF data as JSONL with system, user, and assistant messages for fine-tuning.
  • Use client.files.create to upload data and client.fine_tuning.jobs.create to start training.
  • Poll the fine-tuning job status until completion before querying the fine-tuned model.
  • Customize training parameters and use async polling or streaming when querying the fine-tuned model.
  • Validate data format and monitor job errors to troubleshoot fine-tuning issues.
Verified 2026-04 · gpt-4o-mini-2024-07-18, gpt-4o
Verify ↗