High severity intermediate · Fix: 5-10 min

ValueError or KeyError: checkpoint not found

langgraph.checkpoint.base.ValueError / KeyError (checkpoint retrieval failure)

What this error means
LangGraph's MemorySaver cannot find a checkpoint for the given thread_id, meaning the state from a previous run was not persisted or the thread_id lookup is invalid.

Stack trace

traceback
Traceback (most recent call last):
  File "main.py", line 42, in <module>
    config = {"configurable": {"thread_id": "user_123"}}
    result = graph.invoke(state, config=config)
  File "langgraph/graph/graph.py", line 215, in invoke
    initial_state = self.checkpoint_storage.get(config["configurable"]["thread_id"])
  File "langgraph/checkpoint/memory.py", line 58, in get
    raise ValueError(f"Checkpoint not found for thread_id: {thread_id}")
ValueError: Checkpoint not found for thread_id: user_123
QUICK FIX
Wrap your graph.invoke() with a try/except ValueError block; on first run catch the error and invoke without thread_id to initialize state, then store the thread_id for subsequent resumption.

Why it happens

LangGraph's MemorySaver stores agent state in memory using thread_id as the key. When you invoke a graph with a thread_id that has no prior checkpoint (first run, thread never saved, or memory cleared), MemorySaver raises an error instead of gracefully handling the missing state. This happens because MemorySaver is designed for session persistence: it expects you to check whether a thread exists before resuming it. If you're running a multi-turn conversation or stateful agent, you must either create the checkpoint on first run or check for its existence before resuming.

Detection

Before invoking a graph with a thread_id, check if the checkpoint exists in your MemorySaver instance using checkpoint_storage.get_or_none(thread_id) or wrap the invoke call in try/except ValueError to catch missing checkpoints. Log both successful and failed checkpoint retrievals to identify threads that never initialized properly.

Causes & fixes

1

First run with a new thread_id: the checkpoint hasn't been created yet because this thread never ran before

✓ Fix

On first invocation, either omit the config/thread_id to run stateless, or use a try/except block to catch ValueError and initialize the thread with a fresh state on first run

2

Restarting your application or using a new MemorySaver instance resets all in-memory checkpoints because MemorySaver doesn't persist across process restarts

✓ Fix

Switch from MemorySaver (in-memory only) to SqliteSaver or a persistent checkpoint storage backend if you need state to survive application restarts

3

Using an incorrect or typo'd thread_id that doesn't match the one from the previous run

✓ Fix

Verify the thread_id string matches exactly (case-sensitive) between the original save and the resume. Log all thread_ids when storing checkpoints to enable debugging

4

Graph.invoke() fails to save the checkpoint after the run completes, leaving no state for future thread resumes

✓ Fix

Ensure your graph is configured with a checkpoint storage backend and that the backend's write permissions are correct. Verify checkpoint saving succeeded by checking checkpoint_storage.list(filter={"tags": [thread_id]})

Code: broken vs fixed

Broken - triggers the error
python
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph
from typing_extensions import TypedDict
import os

class AgentState(TypedDict):
    messages: list[str]
    user_id: str

checkpoint_storage = MemorySaver()

def process_message(state):
    return {"messages": state["messages"] + ["bot response"]}

builder = StateGraph(AgentState)
builder.add_node("process", process_message)
builder.set_entry_point("process")
graph = builder.compile(checkpointer=checkpoint_storage)

# Assume this is a resumption request with thread_id from previous conversation
thread_id = "user_123"
config = {"configurable": {"thread_id": thread_id}}

state = {"messages": ["hello"], "user_id": "user_123"}
# This line fails if thread_id was never saved before
result = graph.invoke(state, config=config)  # ValueError: Checkpoint not found
print(f"Result: {result}")
Fixed - works correctly
python
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph
from typing_extensions import TypedDict
import os

class AgentState(TypedDict):
    messages: list[str]
    user_id: str

checkpoint_storage = MemorySaver()

def process_message(state):
    return {"messages": state["messages"] + ["bot response"]}

builder = StateGraph(AgentState)
builder.add_node("process", process_message)
builder.set_entry_point("process")
graph = builder.compile(checkpointer=checkpoint_storage)

thread_id = "user_123"
config = {"configurable": {"thread_id": thread_id}}
state = {"messages": ["hello"], "user_id": "user_123"}

# FIX: Check if checkpoint exists before resuming; if not, run once to initialize it
try:
    # Try to resume from existing checkpoint
    result = graph.invoke(state, config=config)
except ValueError as e:
    if "Checkpoint not found" in str(e):
        # First run: initialize the thread by invoking without resuming state
        print(f"New thread {thread_id}, initializing...")
        result = graph.invoke(state, config=config)  # This creates the checkpoint
    else:
        raise

print(f"Result: {result}")
print(f"Checkpoint saved for thread {thread_id}")
Added try/except block to catch ValueError when checkpoint doesn't exist; on first run, invoke normally to create and save the checkpoint, then subsequent calls to the same thread_id will find the saved state.

Workaround

If you cannot modify the checkpoint logic, wrap all graph.invoke() calls with a try/except handler that catches ValueError; on missing checkpoint, manually initialize a fresh state object and invoke once with that state but without the thread_id config, then retrieve the saved checkpoint and pass it back in the next invocation. Alternatively, pre-populate your MemorySaver with empty checkpoints for known thread_ids before users invoke the graph.

Prevention

For multi-turn stateful agents, use a persistent checkpoint backend (SqliteSaver, PostgresSaver) instead of MemorySaver so state survives application restarts. Implement a user session management layer that checks checkpoint existence before attempting resumption. Use a default checkpoint initialization function that creates an empty state for new thread_ids instead of raising errors. Log all thread_id creations and resumptions to enable monitoring of checkpoint lifecycle.

Python 3.9+ · langgraph >=0.1.0 · tested on 0.2.x
Verified 2026-04
Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.