How to Intermediate · 4 min read

How to split long prompts

Quick answer
To split long prompts, divide the input into smaller, logically coherent chunks and send them sequentially or in parallel using client.chat.completions.create. Use techniques like chunking by paragraphs or sections and maintain context by summarizing or referencing previous parts in subsequent prompts.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Setup

Install the OpenAI Python SDK and set your API key as an environment variable to use the gpt-4o model for prompt splitting.

bash
pip install openai>=1.0

Step by step

This example splits a long text into chunks by paragraphs and sends each chunk sequentially to the gpt-4o model, appending a summary of previous chunks to maintain context.

python
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

long_text = """Paragraph 1: Introduction to AI.

Paragraph 2: History of AI development.

Paragraph 3: Current AI applications.

Paragraph 4: Future trends in AI."""

# Split text into paragraphs
chunks = [p.strip() for p in long_text.split('\n\n') if p.strip()]

summary = ""
responses = []

for i, chunk in enumerate(chunks):
    prompt = f"{summary}\nProcess this paragraph:\n{chunk}\nProvide a concise summary."
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    summary_chunk = response.choices[0].message.content.strip()
    responses.append(summary_chunk)
    # Update summary to include latest chunk summary
    summary += f"\nSummary of paragraph {i+1}: {summary_chunk}"

print("Final summaries of each chunk:")
for idx, res in enumerate(responses, 1):
    print(f"Paragraph {idx} summary: {res}")
output
Final summaries of each chunk:
Paragraph 1 summary: Introduction to AI and its basics.
Paragraph 2 summary: Overview of AI's historical development.
Paragraph 3 summary: Examples of current AI applications.
Paragraph 4 summary: Predictions for AI's future trends.

Common variations

You can split prompts asynchronously or stream responses for faster processing. Using different models like claude-3-5-sonnet-20241022 or gemini-1.5-pro is also possible by adjusting the model parameter. For very long documents, consider chunking by token count using libraries like tiktoken.

python
import os
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def process_chunk(chunk, summary):
    prompt = f"{summary}\nProcess this paragraph:\n{chunk}\nProvide a concise summary."
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content.strip()

async def main():
    long_text = "Paragraph 1.\n\nParagraph 2.\n\nParagraph 3."
    chunks = [p.strip() for p in long_text.split('\n\n') if p.strip()]
    summary = ""
    for i, chunk in enumerate(chunks):
        summary_chunk = await process_chunk(chunk, summary)
        print(f"Summary {i+1}: {summary_chunk}")
        summary += f"\nSummary of paragraph {i+1}: {summary_chunk}"

asyncio.run(main())
output
Summary 1: Summary of paragraph 1.
Summary 2: Summary of paragraph 2.
Summary 3: Summary of paragraph 3.

Troubleshooting

If you encounter token limit errors, reduce chunk size or summarize earlier chunks more aggressively. For context loss, maintain a rolling summary or key points instead of full text. If API rate limits occur, add retry logic with exponential backoff.

python
import time
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

max_retries = 3

for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": "Your prompt here"}]
        )
        print(response.choices[0].message.content)
        break
    except Exception as e:
        print(f"Attempt {attempt+1} failed: {e}")
        time.sleep(2 ** attempt)  # exponential backoff
else:
    print("Failed after multiple retries.")
output
Attempt 1 failed: Rate limit exceeded
Attempt 2 failed: Rate limit exceeded
Attempt 3 failed: Rate limit exceeded
Failed after multiple retries.

Key Takeaways

  • Split long prompts into smaller, logical chunks to avoid token limits and improve response quality.
  • Maintain context across chunks by summarizing or referencing previous parts in subsequent prompts.
  • Use asynchronous calls or streaming to handle multiple chunks efficiently.
  • Adjust chunk size and summarization to balance detail and token usage.
  • Implement retry logic to handle API rate limits gracefully.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, gemini-1.5-pro
Verify ↗