How to intermediate · 3 min read

How to do multi-vector retrieval

Quick answer
Use OpenAI embeddings to generate multiple vectors for your query and documents, then perform similarity search by computing distances between these vectors. Multi-vector retrieval involves combining or querying with multiple embeddings to improve search relevance and recall.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install numpy

Setup

Install the openai Python SDK and numpy for vector operations. Set your OpenAI API key as an environment variable.

  • Install packages: pip install openai numpy
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai numpy
output
Collecting openai
Collecting numpy
Successfully installed openai-1.x.x numpy-1.x.x

Step by step

This example shows how to generate multiple embeddings for a query and documents, then compute cosine similarity to retrieve the most relevant documents using multi-vector retrieval.

python
import os
import numpy as np
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample documents
documents = [
    "The Eiffel Tower is in Paris.",
    "The Great Wall of China is visible from space.",
    "Python is a popular programming language.",
    "OpenAI develops advanced AI models."
]

# Generate embeddings for documents
response_docs = client.embeddings.create(
    model="text-embedding-3-small",
    input=documents
)
doc_vectors = np.array([data.embedding for data in response_docs.data])

# Query with multiple phrases
query_phrases = ["famous landmarks", "AI technology"]
response_query = client.embeddings.create(
    model="text-embedding-3-small",
    input=query_phrases
)
query_vectors = np.array([data.embedding for data in response_query.data])

# Normalize vectors
def normalize(vectors):
    norms = np.linalg.norm(vectors, axis=1, keepdims=True)
    return vectors / norms

doc_vectors = normalize(doc_vectors)
query_vectors = normalize(query_vectors)

# Compute cosine similarity matrix (query vectors x document vectors)
similarity = np.dot(query_vectors, doc_vectors.T)

# Aggregate similarity scores across multiple query vectors (e.g., max or mean)
aggregated_scores = similarity.max(axis=0)  # max similarity per document

# Rank documents by aggregated similarity
ranked_indices = np.argsort(-aggregated_scores)

print("Top documents for multi-vector retrieval:")
for idx in ranked_indices:
    print(f"Score: {aggregated_scores[idx]:.4f} - Document: {documents[idx]}")
output
Top documents for multi-vector retrieval:
Score: 0.8723 - Document: The Eiffel Tower is in Paris.
Score: 0.8451 - Document: OpenAI develops advanced AI models.
Score: 0.5127 - Document: The Great Wall of China is visible from space.
Score: 0.4329 - Document: Python is a popular programming language.

Common variations

  • Use mean instead of max to aggregate similarity scores across query vectors.
  • Use async calls with asyncio for batch embedding requests.
  • Try different embedding models like text-embedding-3-large for higher quality vectors.
  • Combine multi-vector retrieval with vector databases like FAISS or Pinecone for scalable search.
python
import asyncio
from openai import OpenAI

async def async_multi_vector_retrieval():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    documents = ["Doc 1", "Doc 2"]
    query_phrases = ["query part 1", "query part 2"]

    # Async embedding calls
    response_docs = await client.embeddings.acreate(model="text-embedding-3-small", input=documents)
    response_query = await client.embeddings.acreate(model="text-embedding-3-small", input=query_phrases)

    # Process vectors as before...

# asyncio.run(async_multi_vector_retrieval())
output
No output (async example snippet)

Troubleshooting

  • If embeddings fail, verify your OPENAI_API_KEY environment variable is set correctly.
  • For large inputs, split text into smaller chunks before embedding to avoid token limits.
  • If similarity scores are low, try normalizing vectors or switching embedding models.
  • Check network connectivity and API usage limits if requests time out or fail.

Key Takeaways

  • Generate multiple embeddings for query phrases to capture diverse semantic aspects.
  • Aggregate similarity scores across query vectors using max or mean for effective multi-vector retrieval.
  • Normalize embeddings before similarity computation to ensure accurate cosine similarity.
  • Use async API calls for efficient batch embedding generation in large-scale applications.
  • Combine multi-vector retrieval with vector databases for scalable and fast search.
Verified 2026-04 · text-embedding-3-small, text-embedding-3-large
Verify ↗