How to extract contract clauses with AI
Quick answer
Use a large language model like
gpt-4o to parse contract text and extract clauses by prompting it to identify and label sections. You can send the contract text as input and instruct the model to return structured clause data in JSON or plain text format.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example shows how to send a contract text to gpt-4o and ask it to extract key clauses like Termination, Confidentiality, and Payment in JSON format.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
contract_text = '''\
This Agreement shall commence on the Effective Date and continue for one year.
Either party may terminate this Agreement with 30 days written notice.
All confidential information must be kept secret for 5 years.
Payment shall be made within 30 days of invoice receipt.
'''
prompt = f"Extract the following clauses from the contract text: Termination, Confidentiality, Payment. Return the result as JSON with clause names as keys and clause text as values.\nContract text:\n{contract_text}"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content) output
{
"Termination": "Either party may terminate this Agreement with 30 days written notice.",
"Confidentiality": "All confidential information must be kept secret for 5 years.",
"Payment": "Payment shall be made within 30 days of invoice receipt."
} Common variations
You can use asynchronous calls with asyncio for higher throughput or switch to other models like claude-3-5-sonnet-20241022 for different style or cost. Streaming output is less common for extraction but possible.
import os
import asyncio
from openai import OpenAI
async def extract_clauses_async():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
contract_text = '''\
This Agreement shall commence on the Effective Date and continue for one year.
Either party may terminate this Agreement with 30 days written notice.
All confidential information must be kept secret for 5 years.
Payment shall be made within 30 days of invoice receipt.
'''
prompt = f"Extract Termination, Confidentiality, Payment clauses as JSON.\nContract text:\n{contract_text}"
response = await client.chat.completions.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)
asyncio.run(extract_clauses_async()) output
{
"Termination": "Either party may terminate this Agreement with 30 days written notice.",
"Confidentiality": "All confidential information must be kept secret for 5 years.",
"Payment": "Payment shall be made within 30 days of invoice receipt."
} Troubleshooting
- If the model returns incomplete clauses, increase
max_tokensor clarify the prompt. - If JSON parsing fails, ask the model to strictly format output as JSON.
- For very long contracts, split text into sections and extract clauses per section.
Key Takeaways
- Use clear prompts instructing the model to extract specific clauses in structured JSON format.
- For long contracts, process text in chunks to avoid token limits and improve accuracy.
- Switch between models like
gpt-4oandclaude-3-5-sonnet-20241022based on cost and style preferences. - Async API calls enable scalable extraction workflows for batch processing.
- Always validate and parse the model output carefully to handle formatting inconsistencies.