Concept Intermediate · 3 min read

What is MMR in vector search

Quick answer
In vector search, MMR (Maximal Marginal Relevance) is a re-ranking algorithm that balances the relevance of results with their diversity to reduce redundancy. It selects vectors that are both similar to the query and dissimilar to each other, improving the quality of search results.
Maximal Marginal Relevance (MMR) is a vector search re-ranking method that balances relevance and diversity to reduce redundancy in retrieved results.

How it works

MMR works by iteratively selecting vectors that maximize relevance to the query while minimizing similarity to already selected results. This ensures the final set of results is both relevant and diverse, avoiding repetitive or near-duplicate items. Think of it like picking a playlist: you want songs you like (relevance) but also variety (diversity) so the list isn’t monotonous.

Concrete example

The following Python example demonstrates a simple MMR implementation using cosine similarity to re-rank vector search results:

python
import os
import numpy as np
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity

# Dummy vectors for illustration
query_vector = np.array([[0.1, 0.3, 0.5]])
doc_vectors = np.array([
    [0.1, 0.3, 0.5],  # Highly relevant
    [0.1, 0.29, 0.48], # Similar to first
    [0.9, 0.1, 0.2],  # Less relevant
    [0.2, 0.4, 0.6]   # Relevant but diverse
])

# Parameters
lambda_param = 0.7  # Balance relevance vs diversity

# Compute similarity to query
sim_to_query = cosine_similarity(doc_vectors, query_vector).flatten()

selected = []
candidate_indices = list(range(len(doc_vectors)))

while candidate_indices:
    if not selected:
        # Pick the most relevant first
        idx = candidate_indices[np.argmax(sim_to_query[candidate_indices])]
        selected.append(idx)
        candidate_indices.remove(idx)
    else:
        mmr_scores = []
        for i in candidate_indices:
            sim_to_selected = max(cosine_similarity(
                doc_vectors[i].reshape(1, -1),
                doc_vectors[selected]
            ).flatten())
            score = lambda_param * sim_to_query[i] - (1 - lambda_param) * sim_to_selected
            mmr_scores.append(score)
        idx = candidate_indices[np.argmax(mmr_scores)]
        selected.append(idx)
        candidate_indices.remove(idx)

print("MMR re-ranked indices:", selected)
output
MMR re-ranked indices: [0, 3, 2, 1]

When to use it

Use MMR in vector search when you want to improve result diversity and reduce redundancy, especially in applications like document retrieval, recommendation systems, or question answering. Avoid MMR if you only need the single most relevant result or if diversity is not a priority, as it may reduce precision in favor of variety.

Key terms

TermDefinition
Maximal Marginal Relevance (MMR)An algorithm balancing relevance and diversity in ranked search results.
RelevanceSimilarity of a vector to the query vector.
DiversityDifference or dissimilarity among selected vectors to avoid redundancy.
Cosine similarityA metric measuring the cosine of the angle between two vectors, indicating similarity.

Key Takeaways

  • MMR balances relevance and diversity to improve vector search results by reducing redundancy.
  • It iteratively selects vectors maximizing query similarity while minimizing similarity to already chosen results.
  • Use MMR when diverse, non-redundant results are more valuable than just top relevance.
  • MMR requires tuning the balance parameter (lambda) to fit your application's needs.
  • Cosine similarity is commonly used to measure relevance and diversity in MMR.
Verified 2026-04
Verify ↗