How to Intermediate · 3 min read

Fix LLM extracting wrong fields

Quick answer

To fix a large language model (LLM) extracting wrong fields, use structured prompts with explicit instructions and response schemas via response_model or JSON schema validation. Also, parse and validate the output strictly to ensure correct field extraction with the OpenAI Python SDK.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pydantic (optional for structured extraction)

Setup

Install the openai Python SDK and set your API key as an environment variable.

Install SDK: pip install openai
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use explicit prompts and a response_model with pydantic to enforce correct field extraction. Validate the output strictly.

python

import os
from openai import OpenAI
from pydantic import BaseModel

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class UserData(BaseModel):
    name: str
    age: int

prompt = "Extract the user's name and age from the text: 'John is 30 years old.'"

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    response_model=UserData
)

user_data = response.response_model
print(f"Name: {user_data.name}, Age: {user_data.age}")

output

Name: John, Age: 30

Common variations

You can also use raw JSON parsing if not using pydantic, or switch to other models like claude-3-5-sonnet-20241022 with the Anthropic SDK for extraction.

python

import json
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "Extract the user's name and age as JSON from the text: 'John is 30 years old.'\n"
    "Respond only with JSON like {\"name\": "", \"age\": 0}"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

try:
    data = json.loads(response.choices[0].message.content)
    print(f"Name: {data['name']}, Age: {data['age']}")
except (json.JSONDecodeError, KeyError) as e:
    print(f"Failed to parse JSON or missing fields: {e}")

output

Name: John, Age: 30

Troubleshooting

If fields are missing or incorrect, refine your prompt to be more explicit and include examples.
Use response_model or strict JSON schema validation to catch errors early.
Check for trailing text or formatting issues in the LLM output.
Test with different models like gpt-4o-mini or claude-3-5-sonnet-20241022 for better extraction accuracy.

Key Takeaways

Use explicit prompts with clear instructions to improve field extraction accuracy.
Leverage response_model with pydantic for structured output validation.
Parse and validate JSON output strictly to catch extraction errors early.
Test multiple models to find the best extractor for your use case.
Refine prompts iteratively based on extraction errors and missing fields.

Verified 2026-04 · gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.