Code Intermediate medium · 7 min

Building a retrieval chain with LCEL

What you will learn
Chain a retriever directly to an LLM using LCEL's pipe operator to answer questions over documents.

Why this matters

Retrieval-augmented generation (RAG) is the foundation of production AI systems that answer questions over your own data. LCEL makes building these chains explicit and debuggable: no black-box chains.

Skip if: Don't use a retrieval chain if you're not grounding answers in external documents. For simple question-answering without context, use a prompt | llm chain. For complex multi-step reasoning with tool calls, use LangGraph instead.

Explanation

A retrieval chain in LCEL connects three pieces: a retriever (pulls relevant documents), a prompt template (formats the question + documents), and an LLM (generates the answer). It uses the pipe operator | to thread data through each stage. Mechanically, when you call chain.invoke({'query': 'what is X?'}), the retriever runs first, returning matching documents. Those documents and the query are formatted into a prompt using a template. That prompt is sent to the LLM, which returns an answer string. The entire flow is a single expression, not nested function calls. When to use it: Any time you need to answer questions grounded in specific documents: customer support bots, documentation Q&A, research paper analysis.

Analogy

It's like a librarian workflow: customer asks a question → librarian searches the catalog (retriever) → librarian reads relevant books to the customer and synthesizes an answer (LLM) → customer gets the response. The pipe operator choreographs this without intermediate variables.

Code

Illustrative only - not runnable without a valid API key
python
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.documents import Document
from langchain_text_splitters import CharacterTextSplitter
from langchain_chroma import Chroma

os.environ['OPENAI_API_KEY'] = 'sk-test-placeholder'

docs_text = """
Python is a high-level programming language created by Guido van Rossum in 1989.
It emphasizes code readability and simplicity.
Python is widely used in web development, data science, and machine learning.
The language supports multiple programming paradigms including procedural, object-oriented, and functional styles.
"""

splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=200,
    chunk_overlap=50
)

doc_chunks = splitter.split_text(docs_text)
docs = [Document(page_content=chunk) for chunk in doc_chunks]

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name="python_facts"
)

retriever = vector_store.as_retriever(search_kwargs={"k": 2})

prompt_template = ChatPromptTemplate.from_template(
    """Answer the question based on the following context:

Context:
{context}

Question: {query}

Answer:"""
)

def format_docs(docs):
    return "\n".join([doc.page_content for doc in docs])

llm = ChatOpenAI(model="gpt-4o-mini")
output_parser = StrOutputParser()

retrieval_chain = (
    {"context": retriever | format_docs, "query": lambda x: x["query"]}
    | prompt_template
    | llm
    | output_parser
)

result = retrieval_chain.invoke({"query": "Who created Python and why?"})
print(result)
Output
Python was created by Guido van Rossum in 1989. The language was designed with an emphasis on code readability and simplicity. This focus on clear, understandable code has made Python widely adopted across many domains, including web development, data science, and machine learning.

What just happened?

The code created a local vector store from sample text, initialized a retriever that pulls the 2 most similar chunks, built a prompt template with placeholders for context and query, and chained them together. When invoked, the retriever found relevant documents about Python, the prompt template inserted those documents and the query into the template, the LLM read the full prompt and generated an answer based on the provided context, and the output parser extracted the string response.

Common gotcha

The most common mistake is forgetting to call | format_docs on the retriever output. Retrievers return a list of Document objects, but your prompt expects a string. Without format_docs, you'll get a type error or malformed prompt. Always convert documents to strings before passing to the prompt template.

Error recovery

ValidationError when invoking chain
You likely have a key mismatch between what the chain's input schema expects and what you're passing. Use <code>chain.input_schema.model_json_schema()</code> to inspect the expected input structure.
TypeError: 'Document' object is not subscriptable
The retriever returned Document objects but your prompt template received them directly. Wrap the retriever with a formatting function like <code>retriever | format_docs</code>.
Empty retrieval results
Your vector store embeddings don't match the retriever's expectations, or documents are too small/dissimilar. Check that <code>k</code> in search_kwargs is reasonable and that your document chunks contain actual semantic content.

Experienced dev note

New developers often think they need LangGraph or AgentExecutor for retrieval. Wrong. A simple LCEL chain is faster, more debuggable, and handles 90% of retrieval use cases. Save LangGraph for when you genuinely need branching logic, loops, or tool selection. Also: test your retriever in isolation before building the full chain. Print what retriever.invoke(query) returns before wiring it into the prompt. It saves hours of debugging.

Check your understanding

If your retriever returns 5 documents but you only want 3 in the prompt to save tokens, how would you modify the chain without touching the retriever configuration?

Show answer hint

The answer involves either post-processing the retriever output with a lambda or custom function to slice the list before passing to format_docs, or modifying the <code>search_kwargs</code> parameter in <code>as_retriever()</code> to change the <code>k</code> value. The key insight is that retriever configuration and prompt formatting are separate concerns in LCEL.

VERSION LCEL and the pipe operator have been stable since langchain-core 0.1.0 (Feb 2024). However, in langchain < 1.0.0, the input/output schema handling differed slightly. Modern langchain 1.2.x (April 2026) uses invoke() consistently with automatic schema validation. If you're on langchain < 1.0.0, use chain.run() instead, but migrate: that API is deprecated.
NEXT

Next, learn how to stream retrieval chains in real-time so users see answers chunk-by-chunk instead of waiting for the full response.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.