API Advanced medium · 7 min

Threads: conversation containers

What you will learn

Threads are stateful conversation containers in the Assistants API that automatically manage message history and context across turns.

Why this matters

Building multi-turn conversations with the Assistants API requires understanding thread lifecycle. Threads decouple conversation state from the Assistant definition, allowing you to reuse the same Assistant across multiple independent conversations without manual history management: critical for production systems handling concurrent user sessions.

Skip if: Use raw chat.completions.create() if you need simple request/response without persistent state, don't need tool use across turns, or are building a one-shot interaction. Threads add overhead if you're not leveraging their stateful benefits. For simple chatbots, Messages API is lighter.

Explanation

What Threads Do: A Thread is an object that holds the conversation history between a user and an Assistant. Instead of passing the entire message history with each API call, you create a thread once, then append messages to it and run the Assistant on that thread. The API automatically tracks context, memory, and previous responses. How They Work: When you create a thread, OpenAI assigns it a unique ID and stores it server-side. Each message you add gets an immutable ID and timestamp. When you run the Assistant on a thread, the API sends only the new message plus the thread ID: the system reconstructs full context from the stored history. Tool calls, file references, and response metadata stay attached to the thread for auditing and recovery. When to Use: Threads are essential for any production conversation system: chatbots with session persistence, support ticket threads, multi-turn reasoning workflows, or anywhere you need conversation memory to survive process restarts or be shared across backend instances.

Request code

python

from openai import OpenAI
import json

client = OpenAI()

thread = client.beta.threads.create()
print(f"Thread created: {thread.id}")

client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What is the capital of France?"
)

assistant = client.beta.assistants.create(
    name="Geography Expert",
    model="gpt-4-1106-preview",
    instructions="You are a geography expert. Answer geography questions concisely."
)

run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

import time
while run.status != "completed":
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )
    if run.status == "failed":
        print(f"Run failed: {run.last_error}")
        break
    time.sleep(1)

messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
    if msg.role == "assistant":
        print(f"Assistant: {msg.content[0].text.value}")
    else:
        print(f"User: {msg.content[0].text.value}")

Authentication

Ensure OPENAI_API_KEY environment variable is set before importing. The OpenAI SDK reads this at client instantiation time: `from openai import OpenAI; client = OpenAI()` automatically pulls the key. If running in a container or Lambda, pass explicitly: `client = OpenAI(api_key='sk-...')`. Threads themselves don't require additional scopes: API key must have Assistants API access (default for organization keys).

Response shape

Field	Description
`id`	string: unique thread identifier, use this for all subsequent thread operations
`object`	string: always 'thread'
`created_at`	integer: unix timestamp of thread creation
`metadata`	object: custom key-value pairs you can attach (optional, empty dict by default)

Field guide

id

Store this immediately: it's your handle to the entire conversation. If lost, the thread is orphaned and unrecoverable.

metadata

Attach user_id, session_id, or conversation_type here. This becomes invaluable for querying 'which threads belong to user X' or filtering by conversation purpose.

Setup trap

If you create the thread and immediately add a message, then immediately run the Assistant without waiting for the message.create() call to complete, you may get a race condition where the run executes before the message is attached. Always await or verify the message was added (response includes message ID) before calling runs.create(). In Python's synchronous SDK, this is rarely an issue, but in async contexts, it's a silent killer.

Cost

Each run consumes tokens from your model quota (gpt-4-1106-preview at ~$0.01/$0.03 per 1K input/output tokens as of April 2026). A 10-message thread with context reaching 5K tokens will consume ~5K tokens on each run. Long threads accumulate context; after 10+ turns, each new run may process 10K+ input tokens just for history. Consider thread archival after ~50 messages or implement a rolling window strategy.

Rate limits

Thread creation and message addition are cheap operations (no token cost), but runs are rate-limited per model. If you're running 10 concurrent user threads, each user might hit run rate limits if spamming requests. Implement queue-based run submission or batch threads by user to avoid 429 errors.

Common gotcha

Developers often forget that run.status polling is asynchronous and can take seconds to minutes. Hardcoding a single check or timeout of 100ms will fail. Always implement a polling loop with exponential backoff and a maximum wait time. Additionally, threads are NOT deleted automatically: they persist indefinitely and count against any API quotas or cost models, so implement cleanup for archived conversations.

Error recovery

RateLimitError

You've exceeded requests per minute on the model. Implement exponential backoff with jitter: `time.sleep(2 ** retry_count + random.random())`. OpenAI returns `retry-after` header; respect it.

NotFoundError with thread_id

The thread ID was deleted (manually or after expiration) or doesn't exist. Check that thread.id was stored correctly. Threads don't auto-expire; if NotFound, assume data loss and create a new thread.

InvalidRequestError 'run_id not found'

You're using the wrong run_id or thread_id pair. Runs belong to specific threads. Always retrieve runs within the correct thread context.

AuthenticationError

API key is missing, invalid, or has insufficient permissions. Verify OPENAI_API_KEY is set and has Assistants API access. Test with `client.models.list()`.

Experienced dev note

Threads are stateful by design, which means they're also orphanable: threads with no cleanup policy will bloat your API footprint invisibly. Implement a background job that tags threads with created_at metadata and archives (or soft-deletes) threads older than N days. Secondly, don't poll run status in a tight loop: use exponential backoff starting at 1 second. Third, store thread IDs in your application database linked to user_id; this makes conversation recovery trivial if your backend crashes. Finally, test thread behavior under concurrent load: multiple messages added to the same thread before a run completes can cause subtle sequencing issues.

Check your understanding

You have a thread with 5 existing messages. You add a 6th message, then immediately call runs.create() on that thread without waiting. The run completes and returns a response. Does the Assistant's response include context from the 6th message, and why or why not?

Show answer hint

The synchronous SDK blocks until the message is created before returning control, so the 6th message exists. However, if you're in an async context or the call is somehow non-blocking in your setup, the race condition matters. The safest pattern is to always check the message.id in the response before proceeding.

VERSION Threads API is part of the Assistants API (beta.threads.*). Available in openai 1.0+. The endpoint path uses 'beta' (v1/threads) and may change before general availability. As of April 2026, Threads are stable but monitor release notes for any schema changes to metadata or run object structure.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.