How to Intermediate · 3 min read

Fix LLM extracting wrong fields

Quick answer
To fix a large language model (LLM) extracting wrong fields, use structured prompts with explicit instructions and response schemas via response_model or JSON schema validation. Also, parse and validate the output strictly to ensure correct field extraction with the OpenAI Python SDK.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pydantic (optional for structured extraction)

Setup

Install the openai Python SDK and set your API key as an environment variable.

  • Install SDK: pip install openai
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use explicit prompts and a response_model with pydantic to enforce correct field extraction. Validate the output strictly.

python
import os
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class UserData(BaseModel):
    name: str
    age: int

prompt = "Extract the user's name and age from the text: 'John is 30 years old.'"

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    response_model=UserData
)

user_data = response.response_model
print(f"Name: {user_data.name}, Age: {user_data.age}")
output
Name: John, Age: 30

Common variations

You can also use raw JSON parsing if not using pydantic, or switch to other models like claude-3-5-sonnet-20241022 with the Anthropic SDK for extraction.

python
import json
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "Extract the user's name and age as JSON from the text: 'John is 30 years old.'\n"
    "Respond only with JSON like {\"name\": "", \"age\": 0}"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

try:
    data = json.loads(response.choices[0].message.content)
    print(f"Name: {data['name']}, Age: {data['age']}")
except (json.JSONDecodeError, KeyError) as e:
    print(f"Failed to parse JSON or missing fields: {e}")
output
Name: John, Age: 30

Troubleshooting

  • If fields are missing or incorrect, refine your prompt to be more explicit and include examples.
  • Use response_model or strict JSON schema validation to catch errors early.
  • Check for trailing text or formatting issues in the LLM output.
  • Test with different models like gpt-4o-mini or claude-3-5-sonnet-20241022 for better extraction accuracy.

Key Takeaways

  • Use explicit prompts with clear instructions to improve field extraction accuracy.
  • Leverage response_model with pydantic for structured output validation.
  • Parse and validate JSON output strictly to catch extraction errors early.
  • Test multiple models to find the best extractor for your use case.
  • Refine prompts iteratively based on extraction errors and missing fields.
Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗