How to beginner · 3 min read

How to extract contract clauses with AI

Quick answer
Use a large language model like gpt-4o via the OpenAI Python SDK to parse contract text and extract clauses by prompting the model with clear instructions. Send the contract text as input and request structured clause extraction in JSON or plain text format.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your OPENAI_API_KEY environment variable for authentication.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

Use the OpenAI SDK to send the contract text with a prompt instructing the model to extract clauses. Parse the response for structured output.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

contract_text = '''\
This Agreement is made between the Company and the Contractor.
Clause 1: Payment terms are net 30 days.
Clause 2: Confidentiality must be maintained.
Clause 3: Termination requires 60 days notice.
'''

prompt = f"Extract the contract clauses from the following text as a JSON list with clause number and text:\n\n{contract_text}"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("Extracted clauses:")
print(response.choices[0].message.content)
output
Extracted clauses:
[
  {"clause_number": "1", "text": "Payment terms are net 30 days."},
  {"clause_number": "2", "text": "Confidentiality must be maintained."},
  {"clause_number": "3", "text": "Termination requires 60 days notice."}
]

Common variations

  • Use gpt-4o-mini for faster, cheaper extraction with slightly less accuracy.
  • Implement async calls with asyncio and await for high throughput.
  • Stream partial results using stream=True in chat.completions.create for real-time extraction feedback.
python
import os
import asyncio
from openai import OpenAI

async def extract_clauses_async(text: str):
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    prompt = f"Extract contract clauses as JSON:\n\n{text}"
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

async def main():
    contract = "Clause 1: Delivery within 10 days. Clause 2: Warranty for 1 year."
    clauses = await extract_clauses_async(contract)
    print("Async extracted clauses:")
    print(clauses)

asyncio.run(main())
output
Async extracted clauses:
[
  {"clause_number": "1", "text": "Delivery within 10 days."},
  {"clause_number": "2", "text": "Warranty for 1 year."}
]

Troubleshooting

  • If the model returns unstructured text, clarify the prompt to explicitly request JSON output.
  • For very long contracts, split the text into smaller chunks before extraction to avoid token limits.
  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.

Key Takeaways

  • Use clear prompts instructing the model to output structured JSON for reliable clause extraction.
  • The gpt-4o model balances accuracy and cost for contract clause extraction tasks.
  • Async and streaming calls improve performance for large-scale or real-time extraction workflows.
Verified 2026-04 · gpt-4o, gpt-4o-mini
Verify ↗