How to beginner · 3 min read

How to parse JSON documents with LLM

Quick answer
Use a modern LLM like gpt-4o with the OpenAI Python SDK to parse JSON documents by prompting the model to extract or validate JSON content. Send the JSON text as input and instruct the model to return structured JSON or parsed data in the response.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai>=1.0

Step by step

This example shows how to send a JSON document as a string to the gpt-4o model and ask it to parse and extract specific fields. The model returns the parsed JSON as a string in the response.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

json_document = '''{
  "name": "Alice",
  "age": 30,
  "email": "alice@example.com",
  "skills": ["Python", "AI", "JSON"]
}'''

prompt = f"Parse the following JSON document and return only the 'name' and 'email' fields as JSON:\n{json_document}"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

parsed_json = response.choices[0].message.content
print("Parsed JSON output:", parsed_json)
output
Parsed JSON output: {
  "name": "Alice",
  "email": "alice@example.com"
}

Common variations

  • Use async calls with asyncio for non-blocking parsing.
  • Stream partial JSON parsing results with stream=True in chat.completions.create.
  • Try different models like gpt-4o-mini for faster, cheaper parsing or claude-3-5-sonnet-20241022 for alternative LLMs.
  • Use structured response parsing libraries like instructor or pydantic-ai to enforce JSON schema validation.
python
import asyncio
from openai import OpenAI

async def async_parse_json():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    json_doc = '{"city": "New York", "population": 8000000}'
    prompt = f"Extract the city name from this JSON: {json_doc}"
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    print("Async parsed output:", response.choices[0].message.content)

asyncio.run(async_parse_json())
output
Async parsed output: New York

Troubleshooting

  • If the model returns malformed JSON, explicitly instruct it to respond with valid JSON only.
  • Use max_tokens to limit response length and avoid truncation.
  • For large JSON documents, chunk the input and parse in parts.
  • If parsing fails, verify your prompt clarity and JSON formatting.

Key Takeaways

  • Use clear prompts instructing the LLM to parse and return JSON fields explicitly.
  • Leverage the OpenAI Python SDK with gpt-4o or lighter models for JSON parsing tasks.
  • Async and streaming calls enable efficient handling of large or multiple JSON documents.
  • Validate and sanitize LLM outputs to ensure well-formed JSON before downstream use.
Verified 2026-04 · gpt-4o, gpt-4o-mini, claude-3-5-sonnet-20241022
Verify ↗