Code Advanced hard · 8 min

Thread isolation at scale

What you will learn

Use thread IDs and checkpointing to isolate concurrent graph executions and prevent state collision across users.

Why this matters

When running langgraph agents at production scale with multiple concurrent users or requests, state from one user can leak into another's execution unless you explicitly isolate threads: this causes auth failures, data corruption, and security breaches.

Skip if: You don't need thread isolation if you're running single-user demos, synchronous single-threaded scripts, or each execution is completely independent with no shared memory. However, any production web server (FastAPI, Django) with concurrent requests must implement this.

Explanation

Thread isolation in langgraph means ensuring that each concurrent execution of a graph operates on its own isolated state, without reading or overwriting another thread's messages or context. In langgraph 0.2.x, this is achieved through thread_id: a unique identifier passed to invoke() or stream() via the config parameter, combined with a Checkpointer (like MemorySaver or PostgresSaver) that maintains separate state buckets per thread.

Mechanically, when you call graph.invoke(input, config={'configurable': {'thread_id': 'user-123'}}), the graph saves its state (checkpoints) under the key 'user-123'. A second concurrent invocation with thread_id: 'user-456' reads from a completely separate checkpoint bucket. Without explicit thread IDs, all concurrent executions default to the same thread (often 'default'), causing their states to collide and overwrite each other.

At scale, you must pair thread IDs with a durable checkpointer (not just memory) and ensure each external request (HTTP, queue task, webhook) generates a unique, stable thread ID (typically derived from user ID, conversation ID, or request ID). This pattern is essential for multi-tenant systems, chat applications with user sessions, and API servers handling parallel requests.

Analogy

Think of thread IDs like mailboxes in an apartment building. Each resident (thread/user) has their own mailbox (thread_id). When the mailman (checkpointer) delivers a letter (state), he puts it in the correct mailbox using the address. Without clear addresses, all mail gets stuffed into one mailbox and residents get each other's bills.

Code

python

import uuid
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage, HumanMessage

class State(TypedDict):
    messages: list[BaseMessage]
    user_id: str

def process_node(state: State) -> State:
    new_messages = state["messages"] + [
        HumanMessage(content=f"Processing for user {state['user_id']}")
    ]
    return {"messages": new_messages, "user_id": state["user_id"]}

def build_graph():
    graph = StateGraph(State)
    graph.add_node("process", process_node)
    graph.add_edge(START, "process")
    graph.add_edge("process", END)
    return graph.compile(checkpointer=MemorySaver())

compiled_graph = build_graph()

thread_id_user_1 = "user-alice-001"
thread_id_user_2 = "user-bob-002"

result_1 = compiled_graph.invoke(
    {"messages": [HumanMessage(content="Hello from Alice")], "user_id": "alice"},
    config={"configurable": {"thread_id": thread_id_user_1}}
)
print(f"User 1 messages: {len(result_1['messages'])}")
print(f"User 1 last message: {result_1['messages'][-1].content}")

result_2 = compiled_graph.invoke(
    {"messages": [HumanMessage(content="Hello from Bob")], "user_id": "bob"},
    config={"configurable": {"thread_id": thread_id_user_2}}
)
print(f"User 2 messages: {len(result_2['messages'])}")
print(f"User 2 last message: {result_2['messages'][-1].content}")

result_1_again = compiled_graph.invoke(
    {"messages": [HumanMessage(content="Another message")], "user_id": "alice"},
    config={"configurable": {"thread_id": thread_id_user_1}}
)
print(f"User 1 after second invoke: {len(result_1_again['messages'])} messages")
print(f"User 1 full history intact: {[msg.content for msg in result_1_again['messages']]}")

Output

User 1 messages: 2
User 1 last message: Processing for user alice
User 2 messages: 2
User 2 last message: Processing for user bob
User 1 after second invoke: 3 messages
User 1 full history intact: ['Hello from Alice', 'Processing for user alice', 'Another message']

What just happened?

Two different users (alice and bob) executed the same graph concurrently with different thread IDs. The MemorySaver checkpointer stored their messages in separate buckets keyed by thread_id. When alice invoked again with the same thread_id, she retrieved her prior state (2 messages) and added to it, proving isolation worked: bob's state never mixed with alice's.

Common gotcha

The most common mistake is forgetting that configurable is a nested dict: you must pass config={'configurable': {'thread_id': 'xyz'}}, not config={'thread_id': 'xyz'}. The outer configurable key is where all runtime parameters live. Second gotcha: MemorySaver only works in-process; if you have multiple Python processes or servers, it won't share state between them: you must use a durable checkpointer like PostgresSaver or RedisSaver.

Error recovery

KeyError on checkpoint read

If you pass a thread_id that was never saved, the checkpointer returns None. The graph will start fresh. This is intentional for new threads, but if you expect history and don't see it, verify the thread_id string matches exactly (they're case-sensitive and whitespace-sensitive).

MemorySaver losing state across restarts

MemorySaver stores checkpoints only in RAM: restarting the Python process clears all state. For production, use PostgresSaver (requires psycopg2 and a Postgres database) or implement a custom Checkpointer. Set it up with: checkpointer=PostgresSaver(connection_string='postgresql://user:pass@localhost/langgraph')

thread_id collision in multi-tenant app

If you use a non-unique thread_id (e.g., all users get 'default' or 'session'), their states collide. Always derive thread_id from something globally unique: user ID, session ID, conversation ID, or a UUID. Example: thread_id = f'{user_id}_{conversation_id}' or thread_id = str(uuid.uuid4())

Experienced dev note

In production, thread_id is not just a feature: it's your auth and data isolation boundary. Treat it with the same care as you would a JWT token. A common mistake is generating random thread_ids for each request, which loses history across requests for the same user. Instead, derive thread_id deterministically from user context (e.g., user_id or conversation_id) so the same user always gets their checkpoint back. Also, always use a durable checkpointer in production, never MemorySaver: a single pod restart or process crash will lose all state for all users. PostgresSaver is battle-tested; if using a different database, ensure your Checkpointer's get_tuple() and put() methods are atomic and indexed by thread_id for performance at scale.

Check your understanding

You have two concurrent requests from the same user hitting your FastAPI endpoint. Why would passing the same thread_id for both requests be correct, and what would happen if you generated a new UUID for each request instead?

Show answer hint

A correct answer explains that the same user should share a single thread_id so their messages accumulate in one checkpoint (stateful conversation), and that generating a new UUID per request would create isolated executions with no shared history between requests.

VERSION langgraph 0.2.x uses configurable dicts for all runtime config, including thread_id. In earlier 0.1.x versions, the pattern was less standardized. Ensure you're using `from langgraph.checkpoint.memory import MemorySaver` (not a direct import from langgraph), which is the 0.2.x standard location.

Explore persistence and durability with PostgresSaver to understand how to scale thread isolation across multiple servers and survive process restarts without losing conversation history.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.