How to Intermediate · 3 min read

How to use Instructor with local models

Quick answer
Use instructor with local models by loading a local LLM client (e.g., llama_cpp or OpenAI with local endpoint) and passing it to instructor.from_openai() or a similar adapter. Define your pydantic.BaseModel schema and call client.chat.completions.create() with response_model to extract structured data.

PREREQUISITES

  • Python 3.8+
  • pip install instructor openai llama-cpp-python pydantic
  • Local LLM model files or local LLM server running

Setup

Install instructor and a local LLM client such as llama-cpp-python or use openai SDK pointed to a local endpoint. Also install pydantic for schema validation.

Run:

bash
pip install instructor llama-cpp-python pydantic openai

Step by step

Load your local LLM client and wrap it with instructor. Define a pydantic.BaseModel for the structured output you want. Then call client.chat.completions.create() with response_model to parse the response.

python
import os
from instructor import from_openai
from pydantic import BaseModel

# Example using OpenAI SDK pointed to local LLM server
from openai import OpenAI

# Initialize OpenAI client with local endpoint
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="http://localhost:8080/v1"  # your local LLM server endpoint
)

# Wrap with instructor
instructor_client = from_openai(client)

# Define structured output model
class UserInfo(BaseModel):
    name: str
    age: int

# Prepare prompt
messages = [{"role": "user", "content": "Extract: John is 30 years old."}]

# Call with response_model to parse structured data
response = instructor_client.chat.completions.create(
    model="gpt-4o-mini",  # model name your local server supports
    messages=messages,
    response_model=UserInfo
)

print(response.name, response.age)
output
John 30

Common variations

  • Use instructor.from_anthropic() if your local Anthropic-compatible server is available.
  • For async usage, call await instructor_client.chat.completions.acreate(...).
  • Change model to your local model name or endpoint-supported model.
  • Use streaming by setting stream=True and iterating over chunks.

Troubleshooting

  • If you get response_model parsing errors, verify your Pydantic model matches the expected output format.
  • Ensure your local LLM server supports chat completions and the model name used.
  • Check environment variables for API keys and local endpoint URLs.
  • For connection errors, confirm your local server is running and accessible.

Key Takeaways

  • Use instructor.from_openai() with a local LLM client to enable structured extraction.
  • Define your output schema with pydantic.BaseModel and pass it as response_model.
  • Local LLM servers must support chat completions API compatible with OpenAI or Anthropic SDKs.
  • Async and streaming calls are supported by instructor with local models.
  • Verify model names and endpoints match your local LLM setup to avoid connection issues.
Verified 2026-04 · gpt-4o-mini
Verify ↗