How to Intermediate · 4 min read

How to use metadata filtering in RAG

Quick answer
Use metadata filtering in RAG by applying filters on document metadata fields during vector similarity search to restrict retrieval to relevant subsets. This improves precision by limiting results to documents matching criteria like date, category, or source before passing them to the LLM for generation.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install faiss-cpu or chromadb

Setup

Install necessary libraries and set your environment variable for the OpenAI API key.

bash
pip install openai faiss-cpu

Step by step

This example demonstrates how to perform metadata filtering in a RAG pipeline using a vector store that supports metadata filters (e.g., FAISS or Chroma). We filter documents by a metadata field before retrieval.

python
import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Example documents with metadata
documents = [
    {"text": "Document about AI", "metadata": {"category": "tech", "year": 2023}},
    {"text": "Document about cooking", "metadata": {"category": "food", "year": 2022}},
    {"text": "Document about space", "metadata": {"category": "science", "year": 2023}}
]

# Simulated vector store with metadata filtering capability
class SimpleVectorStore:
    def __init__(self, docs):
        self.docs = docs

    def query(self, query_text, metadata_filter):
        # Filter docs by metadata
        filtered_docs = [d for d in self.docs if all(
            d["metadata"].get(k) == v for k, v in metadata_filter.items()
        )]
        # Return texts of filtered docs (simulate retrieval)
        return [d["text"] for d in filtered_docs]

# Create vector store
vector_store = SimpleVectorStore(documents)

# Define metadata filter to only get tech documents from 2023
metadata_filter = {"category": "tech", "year": 2023}

# Query vector store with metadata filtering
retrieved_docs = vector_store.query("AI advancements", metadata_filter)

# Use retrieved docs as context for LLM
prompt = f"Answer based on these documents: {retrieved_docs}\nQuestion: What are the latest AI advancements?"

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print("LLM response:", response.choices[0].message.content)
output
LLM response: ... (generated answer based on filtered documents)

Common variations

  • Use async calls with asyncio and client.chat.completions.acreate() for non-blocking retrieval and generation.
  • Apply metadata filters in vector stores like Chroma or FAISS that support filtering by passing filter dicts to their query methods.
  • Switch models to claude-3-5-sonnet-20241022 or gemini-1.5-pro depending on your preference for coding or general tasks.

Troubleshooting

  • If no documents are retrieved, verify your metadata keys and values exactly match those in your stored documents.
  • If the LLM response is irrelevant, check that the filtered documents contain sufficient context for the query.
  • For large datasets, ensure your vector store indexing supports efficient metadata filtering to avoid slow queries.

Key Takeaways

  • Metadata filtering restricts retrieval to relevant document subsets, improving RAG precision.
  • Pass metadata filters as dictionaries to your vector store query method before LLM generation.
  • Ensure metadata keys and values are consistent and indexed for efficient filtering.
  • Use models like gpt-4o or claude-3-5-sonnet-20241022 for best results.
  • Async and streaming variants can optimize latency in production RAG systems.
Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, gemini-1.5-pro
Verify ↗