How to filter documents by metadata in LlamaIndex
Quick answer
In
LlamaIndex, filter documents by metadata using the metadata_filter parameter in query methods like query or as_retriever. Pass a dictionary specifying key-value pairs to match metadata fields, enabling precise document retrieval based on metadata.PREREQUISITES
Python 3.8+pip install llama-index>=0.6.0Basic knowledge of Python and LlamaIndex
Setup
Install llama-index via pip and prepare your environment.
pip install llama-index>=0.6.0 Step by step
This example shows how to create documents with metadata, build an index, and filter documents by metadata during querying.
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, Document
# Create documents with metadata
documents = [
Document(text="Document about cats.", metadata={"category": "animals", "type": "pet"}),
Document(text="Document about dogs.", metadata={"category": "animals", "type": "pet"}),
Document(text="Document about cars.", metadata={"category": "vehicles", "type": "transport"}),
]
# Build the index
index = GPTVectorStoreIndex(documents)
# Define metadata filter to get only documents in category 'animals'
metadata_filter = {"category": "animals"}
# Query the index with metadata filter
response = index.query(
"Tell me about pets.",
metadata_filter=metadata_filter
)
print(response.response) output
Document about cats. Document about dogs.
Common variations
You can also apply metadata filters when using retrievers or other query interfaces. For example, with as_retriever():
retriever = index.as_retriever(metadata_filter={"type": "pet"})
results = retriever.retrieve("Tell me about pets.")
for doc in results:
print(doc.text) output
Document about cats. Document about dogs.
Troubleshooting
- If no documents are returned, verify your metadata keys and values exactly match those in your documents.
- Ensure metadata is provided as a dictionary when creating
Documentinstances. - Check that your
llama-indexversion supportsmetadata_filter(version 0.6.0+).
Key Takeaways
- Use the metadata_filter parameter to restrict queries to documents matching specific metadata.
- Metadata must be set as a dictionary when creating Document objects for filtering to work.
- Filtering by metadata works both in direct queries and retriever interfaces.
- Verify metadata keys and values carefully to avoid empty query results.