Concept Intermediate · 3 min read

What is MMR in vector search

Q: What is MMR in vector search

In vector search, MMR (Maximal Marginal Relevance) is a re-ranking algorithm that balances the relevance of results with their diversity to reduce redundancy. It selects vectors that are both similar to the query and dissimilar to each other, improving the quality of search results.

Quick answer

In vector search, MMR (Maximal Marginal Relevance) is a re-ranking algorithm that balances the relevance of results with their diversity to reduce redundancy. It selects vectors that are both similar to the query and dissimilar to each other, improving the quality of search results.

Maximal Marginal Relevance (MMR) is a vector search re-ranking method that balances relevance and diversity to reduce redundancy in retrieved results.

How it works

MMR works by iteratively selecting vectors that maximize relevance to the query while minimizing similarity to already selected results. This ensures the final set of results is both relevant and diverse, avoiding repetitive or near-duplicate items. Think of it like picking a playlist: you want songs you like (relevance) but also variety (diversity) so the list isn’t monotonous.

Concrete example

The following Python example demonstrates a simple MMR implementation using cosine similarity to re-rank vector search results:

python

import os
import numpy as np
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity

# Dummy vectors for illustration
query_vector = np.array([[0.1, 0.3, 0.5]])
doc_vectors = np.array([
    [0.1, 0.3, 0.5],  # Highly relevant
    [0.1, 0.29, 0.48], # Similar to first
    [0.9, 0.1, 0.2],  # Less relevant
    [0.2, 0.4, 0.6]   # Relevant but diverse
])

# Parameters
lambda_param = 0.7  # Balance relevance vs diversity

# Compute similarity to query
sim_to_query = cosine_similarity(doc_vectors, query_vector).flatten()

selected = []
candidate_indices = list(range(len(doc_vectors)))

while candidate_indices:
    if not selected:
        # Pick the most relevant first
        idx = candidate_indices[np.argmax(sim_to_query[candidate_indices])]
        selected.append(idx)
        candidate_indices.remove(idx)
    else:
        mmr_scores = []
        for i in candidate_indices:
            sim_to_selected = max(cosine_similarity(
                doc_vectors[i].reshape(1, -1),
                doc_vectors[selected]
            ).flatten())
            score = lambda_param * sim_to_query[i] - (1 - lambda_param) * sim_to_selected
            mmr_scores.append(score)
        idx = candidate_indices[np.argmax(mmr_scores)]
        selected.append(idx)
        candidate_indices.remove(idx)

print("MMR re-ranked indices:", selected)

output

MMR re-ranked indices: [0, 3, 2, 1]

When to use it

Use MMR in vector search when you want to improve result diversity and reduce redundancy, especially in applications like document retrieval, recommendation systems, or question answering. Avoid MMR if you only need the single most relevant result or if diversity is not a priority, as it may reduce precision in favor of variety.

Key terms

Term	Definition
Maximal Marginal Relevance (MMR)	An algorithm balancing relevance and diversity in ranked search results.
Relevance	Similarity of a vector to the query vector.
Diversity	Difference or dissimilarity among selected vectors to avoid redundancy.
Cosine similarity	A metric measuring the cosine of the angle between two vectors, indicating similarity.

✅

Key Takeaways

MMR balances relevance and diversity to improve vector search results by reducing redundancy.
It iteratively selects vectors maximizing query similarity while minimizing similarity to already chosen results.
Use MMR when diverse, non-redundant results are more valuable than just top relevance.
MMR requires tuning the balance parameter (lambda) to fit your application's needs.
Cosine similarity is commonly used to measure relevance and diversity in MMR.

Verified 2026-04

Verify ↗