How to evaluate fine-tuned model
Quick answer
Use the OpenAI SDK v1 to call your fine-tuned model by specifying its name in
model when creating a chat completion. Evaluate the output by sending test prompts and comparing responses to expected results programmatically or manually.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the latest OpenAI Python SDK and set your API key as an environment variable.
pip install openai>=1.0 output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Use the OpenAI Python SDK to send test prompts to your fine-tuned model and print the responses for evaluation.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Replace with your fine-tuned model ID
fine_tuned_model = "ft:gpt-4o-mini-2024-07-18-abc123"
# Example test prompts
test_prompts = [
"Explain the benefits of fine-tuning.",
"Summarize the following text: OpenAI provides powerful APIs.",
"What is RAG in AI?"
]
for prompt in test_prompts:
response = client.chat.completions.create(
model=fine_tuned_model,
messages=[{"role": "user", "content": prompt}]
)
print(f"Prompt: {prompt}")
print(f"Response: {response.choices[0].message.content}\n") output
Prompt: Explain the benefits of fine-tuning. Response: Fine-tuning allows a base model to specialize on specific tasks or domains, improving accuracy and relevance. Prompt: Summarize the following text: OpenAI provides powerful APIs. Response: OpenAI offers APIs that enable developers to integrate advanced AI capabilities into their applications. Prompt: What is RAG in AI? Response: RAG stands for Retrieval-Augmented Generation, a technique combining retrieval of documents with generative models for better answers.
Common variations
You can evaluate your fine-tuned model asynchronously or with streaming output. Also, test with different prompt formats or use other OpenAI models for comparison.
import asyncio
from openai import OpenAI
async def async_evaluate():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
fine_tuned_model = "ft:gpt-4o-mini-2024-07-18-abc123"
prompt = "Describe the process of fine-tuning a model."
response = await client.chat.completions.create(
model=fine_tuned_model,
messages=[{"role": "user", "content": prompt}],
stream=True
)
print("Streaming response:")
async for chunk in response:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)
asyncio.run(async_evaluate()) output
Streaming response: Fine-tuning a model involves training a pre-trained base model on your specific dataset to adapt it to your task, improving performance and relevance.
Troubleshooting
- If you get a
model not founderror, verify your fine-tuned model ID is correct and active. - If responses are poor, check your training data quality and consider more training epochs.
- Ensure your API key has permissions to access fine-tuned models.
Key Takeaways
- Use the OpenAI SDK v1
chat.completions.createmethod with your fine-tuned model ID to evaluate. - Test multiple prompts and compare outputs to expected answers for thorough evaluation.
- Async and streaming calls allow real-time evaluation and integration in interactive apps.
- Verify your fine-tuned model ID and API key permissions if you encounter errors.