How to extract dates and numbers from text
Quick answer
Use the
OpenAI Python SDK to send a prompt that instructs the model to extract dates and numbers from text. Parse the model's structured or plain text response to retrieve the extracted data efficiently.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable.
- Run
pip install openaito install the SDK. - Set your API key in your shell:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows).
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example uses the gpt-4o model to extract dates and numbers from a given text. The prompt instructs the model to return JSON with two fields: dates and numbers.
import os
from openai import OpenAI
import json
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
text_to_extract = (
"The event is scheduled for July 20, 2026, and the budget is 15000 dollars. "
"Last year, 2025, we spent 12000."
)
prompt = f"Extract all dates and numbers from the following text as JSON with keys 'dates' and 'numbers':\n\n{text_to_extract}\n\nJSON:"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
extracted_text = response.choices[0].message.content
# Parse the JSON output
try:
extracted_data = json.loads(extracted_text)
except json.JSONDecodeError:
extracted_data = {"error": "Failed to parse JSON", "raw_output": extracted_text}
print("Extracted dates:", extracted_data.get("dates"))
print("Extracted numbers:", extracted_data.get("numbers")) output
Extracted dates: ["July 20, 2026", "2025"] Extracted numbers: [15000, 12000]
Common variations
You can use asynchronous calls with the OpenAI SDK by using asyncio and await. Alternatively, use different models like gpt-4o-mini for faster, cheaper extraction. For more structured extraction, consider instructing the model to output in CSV or XML formats.
import os
import asyncio
from openai import OpenAI
import json
async def extract_async(text):
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = f"Extract all dates and numbers from the following text as JSON with keys 'dates' and 'numbers':\n\n{text}\n\nJSON:"
response = await client.chat.completions.acreate(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
extracted_text = response.choices[0].message.content
try:
return json.loads(extracted_text)
except json.JSONDecodeError:
return {"error": "Failed to parse JSON", "raw_output": extracted_text}
async def main():
text = "The deadline is December 1, 2026, and the cost is 20000 dollars."
result = await extract_async(text)
print("Extracted dates:", result.get("dates"))
print("Extracted numbers:", result.get("numbers"))
asyncio.run(main()) output
Extracted dates: ["December 1, 2026"] Extracted numbers: [20000]
Troubleshooting
- If the model output is not valid JSON, try adding more explicit instructions in the prompt to return only JSON.
- Check your API key environment variable if you get authentication errors.
- Use
print(response)to debug raw model output. - For large texts, consider chunking input to avoid token limits.
Key Takeaways
- Use the OpenAI Python SDK with
chat.completions.createto extract structured data from text. - Instruct the model explicitly to return JSON for easy parsing of dates and numbers.
- Async calls and smaller models like
gpt-4o-minioffer cost-effective extraction options. - Validate and handle JSON parsing errors gracefully to ensure robustness.
- Set your API key securely via environment variables to avoid authentication issues.