How to Intermediate · 4 min read

How AI is used for legal research

Quick answer
AI is used for legal research by leveraging large language models (LLMs) like gpt-4o to analyze, summarize, and extract relevant information from vast legal documents. Developers use embedding models for semantic search and retrieval-augmented generation (RAG) to combine search with AI-generated insights.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the openai Python package and set your API key as an environment variable for secure access.

bash
pip install openai
output
Collecting openai
  Downloading openai-1.x.x-py3-none-any.whl (xx kB)
Installing collected packages: openai
Successfully installed openai-1.x.x

Step by step

This example shows how to use gpt-4o to query legal text and perform semantic search with embeddings for relevant case law retrieval.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example legal query
query = "What are the key elements of a valid contract?"

# Step 1: Generate embeddings for legal documents (simulate with sample text)
legal_docs = [
    "A contract requires offer, acceptance, and consideration.",
    "Contracts must have mutual assent and lawful purpose.",
    "Some contracts require written form to be enforceable."
]

embeddings = []
for doc in legal_docs:
    response = client.embeddings.create(model="text-embedding-3-small", input=doc)
    embeddings.append(response.data[0].embedding)

# Step 2: Generate embedding for query
query_embedding_resp = client.embeddings.create(model="text-embedding-3-small", input=query)
query_embedding = query_embedding_resp.data[0].embedding

# Step 3: Simple similarity search (cosine similarity)
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

scores = [cosine_similarity(query_embedding, e) for e in embeddings]
best_doc_index = np.argmax(scores)

# Step 4: Use GPT-4o to summarize the best matching document
prompt = f"Summarize the key points of this legal text: {legal_docs[best_doc_index]}"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("Summary:", response.choices[0].message.content)
output
Summary: The key elements of a valid contract include an offer, acceptance, and consideration, along with mutual assent and a lawful purpose.

Common variations

You can use asynchronous calls for better performance or switch to other models like claude-3-5-sonnet-20241022 for nuanced legal reasoning. Streaming responses help with large outputs.

python
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_legal_query():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Explain the doctrine of promissory estoppel."}],
        stream=True
    )
    async for chunk in response:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(async_legal_query())
output
The doctrine of promissory estoppel prevents a party from going back on a promise that the other party relied upon, even if a formal contract does not exist.

Troubleshooting

  • If you get authentication errors, verify your OPENAI_API_KEY environment variable is set correctly.
  • For slow responses, try smaller max_tokens or use streaming.
  • If embeddings seem irrelevant, ensure your input text is clean and contextually rich.

Key Takeaways

  • Use embedding models for semantic search to find relevant legal documents efficiently.
  • Combine retrieval with LLMs like gpt-4o for summarization and explanation.
  • Async and streaming APIs improve responsiveness for large legal queries.
  • Proper environment setup and API key management are essential for smooth integration.
Verified 2026-04 · gpt-4o, text-embedding-3-small, claude-3-5-sonnet-20241022
Verify ↗