RunnableParallel: running multiple chains at once
Why this matters
In real applications, you often need to fetch data from multiple sources or run multiple analyses in parallel: RunnableParallel eliminates sequential bottlenecks and makes your LLM workflows faster and more efficient.
Explanation
RunnableParallel is a LCEL (LangChain Expression Language) primitive that runs multiple Runnable chains concurrently and merges their results into a single dictionary. Under the hood, it uses Python's asyncio or threading to invoke all chains at the same time rather than one after another.
Mechanically, RunnableParallel takes a dictionary where each key maps to a chain. When you invoke it with a single input, that input is passed to all chains simultaneously, and their outputs are collected and returned in a new dictionary with the same keys. This is perfect for scenarios like fetching summaries from multiple AI models, querying different data sources in parallel, or running independent processing steps.
Use RunnableParallel whenever you have independent, non-blocking tasks that can run together: such as extracting entities, generating multiple candidate responses, or enriching data from parallel sources.
Analogy
Think of RunnableParallel like ordering multiple dishes at a restaurant. Instead of waiting for the chef to cook dish 1, then dish 2, then dish 3 sequentially, you give the kitchen all three orders at once and they prepare them in parallel. You get all three plates back at roughly the same time.
Code
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel
from langchain_core.output_parsers import StrOutputParser
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
summary_prompt = ChatPromptTemplate.from_template(
"Summarize this in one sentence: {text}"
)
summary_chain = summary_prompt | llm | StrOutputParser()
keywords_prompt = ChatPromptTemplate.from_template(
"Extract 3 key keywords from: {text}"
)
keywords_chain = keywords_prompt | llm | StrOutputParser()
sentiment_prompt = ChatPromptTemplate.from_template(
"What is the sentiment of: {text}? Answer with one word."
)
sentiment_chain = sentiment_prompt | llm | StrOutputParser()
parallel_chain = RunnableParallel(
summary=summary_chain,
keywords=keywords_chain,
sentiment=sentiment_chain
)
text = "The new product launch exceeded all expectations. Customer feedback has been overwhelmingly positive."
result = parallel_chain.invoke({"text": text})
print("Summary:", result["summary"])
print("Keywords:", result["keywords"])
print("Sentiment:", result["sentiment"]) Summary: The new product launch was highly successful with overwhelmingly positive customer feedback. Keywords: product launch, customer feedback, positive Sentiment: positive
What just happened?
The code created three independent LLM chains (summary, keywords, and sentiment analysis). RunnableParallel bundled them together into a single runnable. When invoked with the text input, all three chains executed simultaneously (not sequentially), each receiving the same text. Their outputs were collected into a dictionary with keys matching the names we assigned ("summary", "keywords", "sentiment"). We then printed each result by accessing its dictionary key.
Common gotcha
RunnableParallel passes the same input to all chains. If your chains expect different input shapes or keys, you'll get a KeyError. Use RunnableParallel only when all chains consume identical input structure. If chains need different keys from a larger input dict, you must restructure the input before RunnableParallel.
Error recovery
KeyError: 'text'TypeError: invoke() takes 1 positional argument but 2 were givenasyncio error or RuntimeErrorExperienced dev note
RunnableParallel is lazy: it doesn't actually run chains until you call invoke() or stream(). This means you can build complex parallel workflows and pass them around as configuration before execution. Also, parallel execution here is true concurrency for I/O-bound operations (LLM calls), so you'll see real speed gains. However, if your LLM provider rate-limits you, parallel chains may hit that limit faster: plan your concurrency accordingly in production.
Check your understanding
If you have three chains in RunnableParallel and the second chain throws an exception, what happens to the output of chains 1 and 3? Why?
Show answer hint
A correct answer explains that RunnableParallel will propagate the exception immediately and none of the results are returned: even though chains 1 and 3 completed successfully. This is the default behavior (fail-fast), and it's important because incomplete parallel results are often useless. If you need partial results even on failure, you would need custom error handling.