How to Intermediate · 3 min read

How to measure RAG context relevancy

Quick answer
Measure RAG context relevancy by embedding the query and retrieved documents using a text-embedding-3-small model, then calculating cosine similarity scores between them. Higher similarity scores indicate more relevant context for the generation task.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install numpy scipy

Setup

Install the required Python packages and set your OPENAI_API_KEY environment variable.

  • Install OpenAI SDK and dependencies:
bash
pip install openai numpy scipy
output
Collecting openai
Collecting numpy
Collecting scipy
Successfully installed openai numpy scipy

Step by step

This example shows how to embed a user query and multiple retrieved documents, then compute cosine similarity scores to measure relevancy of each document in the RAG context.

python
import os
import numpy as np
from scipy.spatial.distance import cosine
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def get_embedding(text: str) -> np.ndarray:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return np.array(response.data[0].embedding)

def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
    return 1 - cosine(vec1, vec2)

# Example query and retrieved documents
query = "What are the benefits of Retrieval-Augmented Generation?"
documents = [
    "RAG improves answer accuracy by combining retrieval with generation.",
    "RAG uses embeddings to find relevant documents.",
    "Unrelated document about cooking recipes."
]

# Embed query
query_embedding = get_embedding(query)

# Embed documents
doc_embeddings = [get_embedding(doc) for doc in documents]

# Calculate similarity scores
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings]

# Display relevancy scores
for i, score in enumerate(similarities):
    print(f"Document {i+1} relevancy: {score:.4f}")
output
Document 1 relevancy: 0.8723
Document 2 relevancy: 0.8457
Document 3 relevancy: 0.3121

Common variations

You can measure RAG context relevancy asynchronously using async calls or use other embedding models like claude-embedding-2. Streaming is not applicable for embeddings but you can batch embed multiple texts for efficiency.

python
import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def get_embedding_async(text: str) -> list:
    response = await client.embeddings.acreate(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

async def main():
    query = "Explain RAG context relevancy"
    embedding = await get_embedding_async(query)
    print(f"Async embedding length: {len(embedding)}")

asyncio.run(main())
output
Async embedding length: 1536

Troubleshooting

  • If similarity scores are unexpectedly low, verify your texts are correctly embedded and normalized.
  • Ensure your OPENAI_API_KEY is valid and has embedding model access.
  • Check for network issues if API calls fail.
  • Use cosine similarity (1 - cosine distance) to get a proper relevancy score between 0 and 1.

Key Takeaways

  • Use embedding models like text-embedding-3-small to vectorize queries and documents for relevancy measurement.
  • Calculate cosine similarity between query and document embeddings to quantify RAG context relevancy.
  • Batch embedding and async calls improve efficiency for large-scale RAG systems.
  • Validate API keys and network connectivity to avoid embedding failures.
  • Low similarity scores often indicate irrelevant or poorly matched context documents.
Verified 2026-04 · text-embedding-3-small
Verify ↗