How to beginner · 3 min read

How to reduce Instructor API costs

Quick answer
To reduce Instructor API costs, optimize your prompts by making them concise and use smaller models like gpt-4o-mini when possible. Also, batch requests and cache frequent responses to minimize API calls and token usage.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install instructor

Setup

Install the openai and instructor Python packages and set your OpenAI API key as an environment variable.

  • Run pip install openai instructor
  • Set your API key: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai instructor

Step by step

Use concise prompts and smaller models to reduce token usage and cost. Cache results for repeated queries to avoid redundant API calls.

python
import os
from openai import OpenAI
import instructor

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Create Instructor client from OpenAI
instructor_client = instructor.from_openai(client)

# Define a simple Pydantic model for structured extraction
from pydantic import BaseModel

class UserInfo(BaseModel):
    name: str
    age: int

# Concise prompt to reduce tokens
prompt = "Extract: John is 30 years old"

# Call Instructor API with a smaller model to save cost
response = instructor_client.chat.completions.create(
    model="gpt-4o-mini",
    response_model=UserInfo,
    messages=[{"role": "user", "content": prompt}]
)

print(f"Name: {response.name}, Age: {response.age}")
output
Name: John, Age: 30

Common variations

Use asynchronous calls to batch multiple requests efficiently and reduce overhead. Switch to even smaller models like gpt-4o-mini or fine-tune a smaller model for your task. Cache frequent responses locally or in a database to avoid repeated API calls.

python
import asyncio
import os
from openai import OpenAI
import instructor
from pydantic import BaseModel

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
instructor_client = instructor.from_openai(client)

class UserInfo(BaseModel):
    name: str
    age: int

async def fetch_user_info(prompt: str):
    response = await instructor_client.chat.completions.acreate(
        model="gpt-4o-mini",
        response_model=UserInfo,
        messages=[{"role": "user", "content": prompt}]
    )
    return response

async def main():
    prompts = ["Extract: Alice is 25 years old", "Extract: Bob is 40 years old"]
    tasks = [fetch_user_info(p) for p in prompts]
    results = await asyncio.gather(*tasks)
    for r in results:
        print(f"Name: {r.name}, Age: {r.age}")

if __name__ == "__main__":
    asyncio.run(main())
output
Name: Alice, Age: 25
Name: Bob, Age: 40

Troubleshooting

  • If you see unexpectedly high token usage, review your prompts for verbosity and remove unnecessary context.
  • If API calls fail, verify your OPENAI_API_KEY environment variable is set correctly.
  • For slow responses, consider using smaller models or batching requests asynchronously.

Key Takeaways

  • Use smaller models like gpt-4o-mini to reduce token costs with Instructor API.
  • Optimize and shorten prompts to minimize token consumption per request.
  • Batch requests asynchronously and cache frequent responses to lower API call volume.
Verified 2026-04 · gpt-4o-mini
Verify ↗