How to Intermediate · 3 min read

How to extract financial data with LLM

Q: How to extract financial data with LLM

Use a large language model like gpt-4o to extract financial data by prompting it with structured instructions and example formats. Send your financial text as input via the chat.completions.create API and parse the model's structured JSON or tabular output for reliable data extraction.

Quick answer

Use a large language model like gpt-4o to extract financial data by prompting it with structured instructions and example formats. Send your financial text as input via the chat.completions.create API and parse the model's structured JSON or tabular output for reliable data extraction.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash

pip install openai>=1.0

output

Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to extract key financial data such as revenue, net income, and EPS from a financial report snippet using gpt-4o. The prompt instructs the model to return JSON for easy parsing.

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

financial_text = """
Company XYZ reported a revenue of $5 billion in 2025, with a net income of $1.2 billion and earnings per share (EPS) of $3.45.
"""

prompt = f"Extract the financial data as JSON with keys: revenue, net_income, eps.\nText:\n{financial_text}\nJSON:"""

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

extracted_json = response.choices[0].message.content
print("Extracted financial data:", extracted_json)

output

Extracted financial data: {
  "revenue": "$5 billion",
  "net_income": "$1.2 billion",
  "eps": "$3.45"
}

Common variations

Use gpt-4o-mini for faster, cheaper extraction with slightly less accuracy.
Implement async calls with asyncio and await for scalable extraction pipelines.
Use streaming mode (stream=True) to process large financial documents incrementally.
Customize prompts to extract additional fields like EBITDA, cash flow, or ratios.

python

import asyncio
from openai import OpenAI

async def extract_financial_data_async(text: str):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = f"Extract revenue, net_income, eps as JSON.\nText:\n{text}\nJSON:"
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    text = "Company ABC had revenue $3B, net income $800M, EPS $2.10 in 2025."
    result = await extract_financial_data_async(text)
    print("Async extracted data:", result)

asyncio.run(main())

output

Async extracted data: {
  "revenue": "$3B",
  "net_income": "$800M",
  "eps": "$2.10"
}

Troubleshooting

If the model returns unstructured text instead of JSON, clarify the prompt with explicit instructions like "Return only JSON, no extra text."
If extraction misses fields, provide example JSON outputs in the prompt to guide the model.
For inconsistent currency formats, normalize input text or add instructions to standardize units.
If you hit rate limits, implement exponential backoff or switch to a smaller model.

✅

Key Takeaways

Use explicit JSON output prompts to reliably extract structured financial data from LLMs.
The gpt-4o model balances accuracy and cost for financial extraction tasks.
Async and streaming API calls enable scalable processing of large financial documents.
Prompt engineering with examples improves extraction quality and consistency.
Handle API rate limits and format inconsistencies proactively for robust pipelines.

Verified 2026-04 · gpt-4o, gpt-4o-mini

Verify ↗