How to Intermediate · 3 min read

How to use Instructor with local models

Q: How to use Instructor with local models

Use instructor with local models by loading a local LLM client (e.g., llama_cpp or OpenAI with local endpoint) and passing it to instructor.from_openai() or a similar adapter. Define your pydantic.BaseModel schema and call client.chat.completions.create() with response_model to extract structured data.

Quick answer

Use instructor with local models by loading a local LLM client (e.g., llama_cpp or OpenAI with local endpoint) and passing it to instructor.from_openai() or a similar adapter. Define your pydantic.BaseModel schema and call client.chat.completions.create() with response_model to extract structured data.

PREREQUISITES

Python 3.8+
pip install instructor openai llama-cpp-python pydantic
Local LLM model files or local LLM server running

Setup

Install instructor and a local LLM client such as llama-cpp-python or use openai SDK pointed to a local endpoint. Also install pydantic for schema validation.

Run:

bash

pip install instructor llama-cpp-python pydantic openai

Step by step

Load your local LLM client and wrap it with instructor. Define a pydantic.BaseModel for the structured output you want. Then call client.chat.completions.create() with response_model to parse the response.

python

import os
from instructor import from_openai
from pydantic import BaseModel

# Example using OpenAI SDK pointed to local LLM server
from openai import OpenAI

# Initialize OpenAI client with local endpoint
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="http://localhost:8080/v1"  # your local LLM server endpoint
)

# Wrap with instructor
instructor_client = from_openai(client)

# Define structured output model
class UserInfo(BaseModel):
    name: str
    age: int

# Prepare prompt
messages = [{"role": "user", "content": "Extract: John is 30 years old."}]

# Call with response_model to parse structured data
response = instructor_client.chat.completions.create(
    model="gpt-4o-mini",  # model name your local server supports
    messages=messages,
    response_model=UserInfo
)

print(response.name, response.age)

output

John 30

Common variations

Use instructor.from_anthropic() if your local Anthropic-compatible server is available.
For async usage, call await instructor_client.chat.completions.acreate(...).
Change model to your local model name or endpoint-supported model.
Use streaming by setting stream=True and iterating over chunks.

Troubleshooting

If you get response_model parsing errors, verify your Pydantic model matches the expected output format.
Ensure your local LLM server supports chat completions and the model name used.
Check environment variables for API keys and local endpoint URLs.
For connection errors, confirm your local server is running and accessible.

✅

Key Takeaways

Use instructor.from_openai() with a local LLM client to enable structured extraction.
Define your output schema with pydantic.BaseModel and pass it as response_model.
Local LLM servers must support chat completions API compatible with OpenAI or Anthropic SDKs.
Async and streaming calls are supported by instructor with local models.
Verify model names and endpoints match your local LLM setup to avoid connection issues.

Verified 2026-04 · gpt-4o-mini

Verify ↗