How to use Instructor with local models
Quick answer
Use
instructor with local models by loading a local LLM client (e.g., llama_cpp or OpenAI with local endpoint) and passing it to instructor.from_openai() or a similar adapter. Define your pydantic.BaseModel schema and call client.chat.completions.create() with response_model to extract structured data.PREREQUISITES
Python 3.8+pip install instructor openai llama-cpp-python pydanticLocal LLM model files or local LLM server running
Setup
Install instructor and a local LLM client such as llama-cpp-python or use openai SDK pointed to a local endpoint. Also install pydantic for schema validation.
Run:
pip install instructor llama-cpp-python pydantic openai Step by step
Load your local LLM client and wrap it with instructor. Define a pydantic.BaseModel for the structured output you want. Then call client.chat.completions.create() with response_model to parse the response.
import os
from instructor import from_openai
from pydantic import BaseModel
# Example using OpenAI SDK pointed to local LLM server
from openai import OpenAI
# Initialize OpenAI client with local endpoint
client = OpenAI(
api_key=os.environ["OPENAI_API_KEY"],
base_url="http://localhost:8080/v1" # your local LLM server endpoint
)
# Wrap with instructor
instructor_client = from_openai(client)
# Define structured output model
class UserInfo(BaseModel):
name: str
age: int
# Prepare prompt
messages = [{"role": "user", "content": "Extract: John is 30 years old."}]
# Call with response_model to parse structured data
response = instructor_client.chat.completions.create(
model="gpt-4o-mini", # model name your local server supports
messages=messages,
response_model=UserInfo
)
print(response.name, response.age) output
John 30
Common variations
- Use
instructor.from_anthropic()if your local Anthropic-compatible server is available. - For async usage, call
await instructor_client.chat.completions.acreate(...). - Change
modelto your local model name or endpoint-supported model. - Use streaming by setting
stream=Trueand iterating over chunks.
Troubleshooting
- If you get
response_modelparsing errors, verify your Pydantic model matches the expected output format. - Ensure your local LLM server supports chat completions and the model name used.
- Check environment variables for API keys and local endpoint URLs.
- For connection errors, confirm your local server is running and accessible.
Key Takeaways
- Use
instructor.from_openai()with a local LLM client to enable structured extraction. - Define your output schema with
pydantic.BaseModeland pass it asresponse_model. - Local LLM servers must support chat completions API compatible with OpenAI or Anthropic SDKs.
- Async and streaming calls are supported by
instructorwith local models. - Verify model names and endpoints match your local LLM setup to avoid connection issues.