How to beginner · 3 min read

How to extract data with OpenAI structured outputs

Q: How to extract data with OpenAI structured outputs

Use the OpenAI API's chat.completions.create method with a prompt that instructs the model to respond in a structured JSON format. Then parse the response.choices[0].message.content as JSON to extract the data fields programmatically.

Quick answer

Use the OpenAI API's chat.completions.create method with a prompt that instructs the model to respond in a structured JSON format. Then parse the response.choices[0].message.content as JSON to extract the data fields programmatically.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the official openai Python package and set your API key as an environment variable.

Install package: pip install openai
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to prompt gpt-4o to return a JSON object with extracted fields, then parse it in Python.

python

import os
import json
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

prompt = (
    "Extract the user's name and age from the following text and return as JSON:\n"
    "Text: 'John is 30 years old.'\n"
    "Respond ONLY with a JSON object with keys 'name' and 'age'."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

content = response.choices[0].message.content

try:
    data = json.loads(content)
    print(f"Name: {data['name']}")
    print(f"Age: {data['age']}")
except json.JSONDecodeError:
    print("Failed to parse JSON:", content)

output

Name: John
Age: 30

Common variations

You can use different models like gpt-4o-mini for faster, cheaper extraction or claude-3-5-sonnet-20241022 with the Anthropic SDK. Async calls and streaming are also supported but less common for structured extraction.

python

import os
import json
import asyncio
from openai import OpenAI

async def async_extract():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = (
        "Extract the user's name and age from the following text and return as JSON:\n"
        "Text: 'Alice is 25 years old.'\n"
        "Respond ONLY with a JSON object with keys 'name' and 'age'."
    )
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    content = response.choices[0].message.content
    data = json.loads(content)
    print(f"Name: {data['name']}")
    print(f"Age: {data['age']}")

asyncio.run(async_extract())

output

Name: Alice
Age: 25

Troubleshooting

If JSON parsing fails, check the model's output for extra text or formatting and refine your prompt to instruct the model to respond with ONLY JSON.
Use try-except blocks to handle malformed JSON gracefully.
For complex extraction, consider using response_model with the instructor library for schema validation.

✅

Key Takeaways

Use explicit prompts instructing the model to respond with JSON only for reliable structured output.
Parse the response.choices[0].message.content as JSON to extract data fields programmatically.
Handle JSON parsing errors gracefully with try-except to avoid runtime crashes.
Async and smaller models like gpt-4o-mini can speed up extraction with lower cost.
For strict schema enforcement, combine OpenAI with libraries like instructor for typed extraction.

Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022

Verify ↗