How to beginner · 3 min read

How to extract dates and numbers from text

Quick answer
Use the OpenAI Python SDK to send a prompt that instructs the model to extract dates and numbers from text. Parse the model's structured or plain text response to retrieve the extracted data efficiently.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable.

  • Run pip install openai to install the SDK.
  • Set your API key in your shell: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows).
bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example uses the gpt-4o model to extract dates and numbers from a given text. The prompt instructs the model to return JSON with two fields: dates and numbers.

python
import os
from openai import OpenAI
import json

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

text_to_extract = (
    "The event is scheduled for July 20, 2026, and the budget is 15000 dollars. "
    "Last year, 2025, we spent 12000."
)

prompt = f"Extract all dates and numbers from the following text as JSON with keys 'dates' and 'numbers':\n\n{text_to_extract}\n\nJSON:" 

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

extracted_text = response.choices[0].message.content

# Parse the JSON output
try:
    extracted_data = json.loads(extracted_text)
except json.JSONDecodeError:
    extracted_data = {"error": "Failed to parse JSON", "raw_output": extracted_text}

print("Extracted dates:", extracted_data.get("dates"))
print("Extracted numbers:", extracted_data.get("numbers"))
output
Extracted dates: ["July 20, 2026", "2025"]
Extracted numbers: [15000, 12000]

Common variations

You can use asynchronous calls with the OpenAI SDK by using asyncio and await. Alternatively, use different models like gpt-4o-mini for faster, cheaper extraction. For more structured extraction, consider instructing the model to output in CSV or XML formats.

python
import os
import asyncio
from openai import OpenAI
import json

async def extract_async(text):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = f"Extract all dates and numbers from the following text as JSON with keys 'dates' and 'numbers':\n\n{text}\n\nJSON:" 
    response = await client.chat.completions.acreate(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    extracted_text = response.choices[0].message.content
    try:
        return json.loads(extracted_text)
    except json.JSONDecodeError:
        return {"error": "Failed to parse JSON", "raw_output": extracted_text}

async def main():
    text = "The deadline is December 1, 2026, and the cost is 20000 dollars."
    result = await extract_async(text)
    print("Extracted dates:", result.get("dates"))
    print("Extracted numbers:", result.get("numbers"))

asyncio.run(main())
output
Extracted dates: ["December 1, 2026"]
Extracted numbers: [20000]

Troubleshooting

  • If the model output is not valid JSON, try adding more explicit instructions in the prompt to return only JSON.
  • Check your API key environment variable if you get authentication errors.
  • Use print(response) to debug raw model output.
  • For large texts, consider chunking input to avoid token limits.

Key Takeaways

  • Use the OpenAI Python SDK with chat.completions.create to extract structured data from text.
  • Instruct the model explicitly to return JSON for easy parsing of dates and numbers.
  • Async calls and smaller models like gpt-4o-mini offer cost-effective extraction options.
  • Validate and handle JSON parsing errors gracefully to ensure robustness.
  • Set your API key securely via environment variables to avoid authentication issues.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗