LLM for legal document analysis
Quick answer
Use a large language model (LLM) like
gpt-4o to analyze legal documents by feeding the text as input and prompting for summaries, clause extraction, or compliance checks. The OpenAI Python SDK enables easy integration for tasks such as contract review, legal Q&A, and entity extraction.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai output
Collecting openai Downloading openai-1.x.x-py3-none-any.whl (xx kB) Installing collected packages: openai Successfully installed openai-1.x.x
Step by step
This example loads a legal contract text, sends it to gpt-4o for clause extraction, and prints the structured output.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
legal_text = '''\
This Agreement is made between Acme Corp and Beta LLC. The parties agree to confidentiality and data protection clauses as follows...'''
messages = [
{"role": "user", "content": f"Extract key clauses and obligations from this legal document:\n{legal_text}"}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=512
)
print("Extracted clauses:\n", response.choices[0].message.content) output
Extracted clauses: 1. Confidentiality clause: Both parties agree to keep information confidential. 2. Data protection clause: Parties must comply with applicable data privacy laws. 3. Agreement between Acme Corp and Beta LLC defines obligations and terms.
Common variations
You can use async calls for better performance, stream partial results for large documents, or switch to other models like claude-3-5-sonnet-20241022 for nuanced legal reasoning.
import asyncio
import os
from openai import OpenAI
async def analyze_legal_async(text: str):
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [{"role": "user", "content": f"Summarize legal risks in this document:\n{text}"}]
response = await client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=512,
stream=True
)
async for chunk in response:
print(chunk.choices[0].delta.content or '', end='', flush=True)
legal_doc = """This contract includes indemnification and limitation of liability clauses..."""
asyncio.run(analyze_legal_async(legal_doc)) output
Summarizes legal risks including indemnification obligations and liability limits streamed token-by-token.
Troubleshooting
- If you receive incomplete responses, increase
max_tokensor use streaming. - For API authentication errors, verify your
OPENAI_API_KEYenvironment variable is set correctly. - If legal jargon is misunderstood, try adding more context or use specialized legal models like
claude-3-5-sonnet-20241022.
Key Takeaways
- Use
gpt-4owith the OpenAI Python SDK for effective legal document analysis. - Prompt engineering is key: clearly instruct the model to extract clauses or summarize risks.
- Async and streaming APIs improve handling of large legal texts.
- Specialized models like
claude-3-5-sonnet-20241022can enhance legal reasoning. - Always secure your API key via environment variables to avoid leaks.