Code Intermediate medium · 5 min

Creating a retriever from a vector store

What you will learn

Convert a vector store into a retriever that can fetch relevant documents using semantic search.

Why this matters

Retrievers are the bridge between your vector store and LLM chains: they handle the actual document lookup when you need context for generation. Without understanding this, you can't build working RAG systems.

Skip if: Don't create a custom retriever if the vector store's built-in `.as_retriever()` method works for your use case. Only build custom retrievers when you need multi-step filtering, re-ranking, or hybrid search that the default implementation doesn't support.

Explanation

A retriever in LangChain is an interface that takes a query string and returns a list of relevant Document objects. A vector store stores embeddings and can perform similarity search, but it's not yet a retriever: it doesn't conform to the retriever interface that chains expect.

The conversion happens through the .as_retriever() method, which wraps the vector store's search capability in a standardized interface. Mechanically: your query string gets embedded using the same model that indexed the documents, then the vector store finds the k-nearest neighbors by cosine similarity, and returns them as Document objects.

Use this when you're building a RAG chain and need your vector store to work seamlessly with LLM chains via the pipe operator (|). You can customize behavior through parameters like search_type (similarity, mmr, similarity_score_threshold) and search_kwargs (k, score_threshold).

Analogy

A vector store is like a library's catalog system: it can find books by similarity. A retriever is the librarian who knows how to use that system and hand you actual books formatted consistently. `.as_retriever()` makes your catalog system act like a librarian.

Code

Illustrative only - not runnable without a valid API key

python

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

docs = [
    Document(page_content="Python is a high-level programming language.", metadata={"source": "wiki"}),
    Document(page_content="Machine learning is a subset of artificial intelligence.", metadata={"source": "wiki"}),
    Document(page_content="Neural networks are inspired by biological neurons.", metadata={"source": "textbook"}),
    Document(page_content="Natural language processing uses transformers and embeddings.", metadata={"source": "paper"}),
]

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = FAISS.from_documents(docs, embeddings)

retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 2}
)

query = "What is machine learning?"
retrieved_docs = retriever.invoke(query)

print("Retrieved documents:")
for i, doc in enumerate(retrieved_docs):
    print(f"\nDocument {i+1}:")
    print(f"Content: {doc.page_content}")
    print(f"Source: {doc.metadata['source']}")

template = """Answer the question based on this context:
{context}

Question: {question}"""

prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI(model="gpt-4o-mini")
parser = StrOutputParser()

rag_chain = prompt | llm | parser

context_str = "\n".join([doc.page_content for doc in retrieved_docs])
result = rag_chain.invoke({"context": context_str, "question": query})
print(f"\nRAG Answer:\n{result}")

Output

Retrieved documents:

Document 1:
Content: Machine learning is a subset of artificial intelligence.
Source: wiki

Document 2:
Content: Neural networks are inspired by biological neurons.
Source: textbook

RAG Answer:
Machine learning is a subset of artificial intelligence. It involves using neural networks and algorithms to enable systems to learn patterns from data without being explicitly programmed.

What just happened?

The code created a FAISS vector store from 4 documents, converted it to a retriever using `.as_retriever(search_type='similarity', search_kwargs={'k': 2})`, then invoked the retriever with a query string. The retriever embedded the query, found the 2 most similar documents, and returned them as Document objects. Those retrieved documents were then passed into a RAG chain that generated an answer.

Common gotcha

Developers often call `.invoke()` on the retriever expecting a string back: but retrievers return a list of Document objects, not text. If you try to pipe a retriever directly into an LLM without converting documents to strings first, you'll get a type error. Always extract doc.page_content or use a RunnableParallel to map documents to text.

Error recovery

TypeError: 'Document' object is not iterable

You're trying to use retriever output directly in a chain without converting it to a string. Use `retriever | (lambda docs: '\n'.join([doc.page_content for doc in docs])) | llm` or use `RunnableParallel` to reshape the output.

AttributeError: 'NoneType' object has no attribute 'invoke'

The vector store returned `None` from `.as_retriever()`. This happens if your vector store wasn't initialized correctly. Ensure you passed a valid embedding model to `FAISS.from_documents()` or `Chroma.from_documents()`.

openai.error.APIError: Embedding API failed

The embedding model specified in `.as_retriever()` doesn't match the model that indexed the documents. The retriever creates embeddings on-the-fly, so if you indexed with 'text-embedding-3-small' but your OpenAIEmbeddings() defaults to a different model, you get mismatched embeddings. Always explicitly specify the embedding model in both places.

Experienced dev note

In production, avoid re-creating embeddings for every query. The `.as_retriever()` method re-embeds the query string each time you call `.invoke()`, which is fine for latency but expensive at scale. If you're running a high-traffic RAG system, pre-compute query embeddings and use the vector store's raw search methods directly, or cache the retriever's embedding model. Also, test your retriever's `search_kwargs` before shipping: a `k=10` might look reasonable in testing but return 10 marginally-relevant documents in production, diluting your RAG quality. Use MMR search (`search_type='mmr'`) for diverse results.

Check your understanding

You have a vector store indexed with documents about Python, JavaScript, and Rust. You create a retriever with `search_type='similarity'` and `k=3`. You query 'What is Go?'. What will the retriever return, and why is this a problem for your RAG chain?

Show answer hint

A correct answer recognizes that the retriever will return the 3 most similar documents about Python, JavaScript, or Rust: even though none are about Go. The problem is that your LLM will be forced to answer a question about Go using irrelevant context, leading to hallucination. This shows why you need either a larger document set that includes Go, or a threshold-based search (`search_type='similarity_score_threshold'`) that returns fewer or no results when similarity is below a cutoff.

VERSION In langchain < 0.1.0, vector stores used `.as_retriever()` inconsistently. From 0.1.0 onwards (including current 1.2.x), all vector stores implement the consistent retriever interface. If you're on an older version, you may need to wrap the vector store manually or use deprecated `VectorStoreRetriever` class.

Next, you'll want to learn how to handle the Document list output from retrievers by converting it to context strings or building a proper RAG chain architecture using LangGraph for multi-step retrieval and re-ranking.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.