How to extract key-value pairs with LLM
Quick answer
Use a
chat.completions.create call with a prompt instructing the LLM to parse text and output key-value pairs in JSON format. Then parse the JSON response in Python to extract the pairs cleanly.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the official openai Python SDK and set your API key as an environment variable.
- Install SDK:
pip install openai - Set environment variable in your shell:
export OPENAI_API_KEY='your_api_key'
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example sends a prompt to gpt-4o instructing it to extract key-value pairs from a given text and return them as JSON. The Python code then parses the JSON response.
import os
import json
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
text_to_parse = "Name: Alice, Age: 30, City: New York"
prompt = f"Extract the key-value pairs from the following text and return a JSON object:\n\n{text_to_parse}\n\nJSON:"
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
json_text = response.choices[0].message.content.strip()
try:
key_values = json.loads(json_text)
except json.JSONDecodeError:
key_values = {}
print("Extracted key-value pairs:", key_values) output
Extracted key-value pairs: {'Name': 'Alice', 'Age': 30, 'City': 'New York'} Common variations
- Use
gpt-4o-miniorclaude-3-5-sonnet-20241022for cost-effective extraction. - For asynchronous calls, use
asynciowith the OpenAI SDK's async methods. - To handle streaming, set
stream=Trueand process chunks incrementally. - Adjust prompt instructions to extract nested or complex key-value structures.
Troubleshooting
- If JSON parsing fails, verify the model's output format and consider adding explicit instructions to output valid JSON only.
- Use
print(json_text)to debug the raw response. - If keys or values are missing, refine the prompt to clarify expected output.
- Check your API key and environment variable if you get authentication errors.
Key Takeaways
- Use explicit prompt instructions to get LLMs to output key-value pairs as JSON.
- Parse the JSON response in Python to extract structured data reliably.
- Choose models like
gpt-4o-miniorclaude-3-5-sonnet-20241022for best accuracy and cost balance. - Test and refine prompts to handle complex or nested key-value extraction.
- Use SDK v1+ patterns and environment variables for secure, production-ready code.