How to extract contract clauses with AI
Quick answer
Use a large language model like
gpt-4o via the OpenAI Python SDK to parse contract text and extract clauses by prompting the model with clear instructions. Send the contract text as input and request structured clause extraction in JSON or plain text format.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your OPENAI_API_KEY environment variable for authentication.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
Use the OpenAI SDK to send the contract text with a prompt instructing the model to extract clauses. Parse the response for structured output.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
contract_text = '''\
This Agreement is made between the Company and the Contractor.
Clause 1: Payment terms are net 30 days.
Clause 2: Confidentiality must be maintained.
Clause 3: Termination requires 60 days notice.
'''
prompt = f"Extract the contract clauses from the following text as a JSON list with clause number and text:\n\n{contract_text}"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print("Extracted clauses:")
print(response.choices[0].message.content) output
Extracted clauses:
[
{"clause_number": "1", "text": "Payment terms are net 30 days."},
{"clause_number": "2", "text": "Confidentiality must be maintained."},
{"clause_number": "3", "text": "Termination requires 60 days notice."}
] Common variations
- Use
gpt-4o-minifor faster, cheaper extraction with slightly less accuracy. - Implement async calls with
asyncioandawaitfor high throughput. - Stream partial results using
stream=Trueinchat.completions.createfor real-time extraction feedback.
import os
import asyncio
from openai import OpenAI
async def extract_clauses_async(text: str):
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
prompt = f"Extract contract clauses as JSON:\n\n{text}"
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
async def main():
contract = "Clause 1: Delivery within 10 days. Clause 2: Warranty for 1 year."
clauses = await extract_clauses_async(contract)
print("Async extracted clauses:")
print(clauses)
asyncio.run(main()) output
Async extracted clauses:
[
{"clause_number": "1", "text": "Delivery within 10 days."},
{"clause_number": "2", "text": "Warranty for 1 year."}
] Troubleshooting
- If the model returns unstructured text, clarify the prompt to explicitly request JSON output.
- For very long contracts, split the text into smaller chunks before extraction to avoid token limits.
- If you get authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly.
Key Takeaways
- Use clear prompts instructing the model to output structured JSON for reliable clause extraction.
- The
gpt-4omodel balances accuracy and cost for contract clause extraction tasks. - Async and streaming calls improve performance for large-scale or real-time extraction workflows.