How to Intermediate · 3 min read

How to measure RAG context relevancy

Quick answer

Measure RAG context relevancy by embedding the query and retrieved documents using a text-embedding-3-small model, then calculating cosine similarity scores between them. Higher similarity scores indicate more relevant context for the generation task.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install numpy scipy

Setup

Install the required Python packages and set your OPENAI_API_KEY environment variable.

Install OpenAI SDK and dependencies:

bash

pip install openai numpy scipy

output

Collecting openai
Collecting numpy
Collecting scipy
Successfully installed openai numpy scipy

Step by step

This example shows how to embed a user query and multiple retrieved documents, then compute cosine similarity scores to measure relevancy of each document in the RAG context.

python

import os
import numpy as np
from scipy.spatial.distance import cosine
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def get_embedding(text: str) -> np.ndarray:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return np.array(response.data[0].embedding)

def cosine_similarity(vec1: np.ndarray, vec2: np.ndarray) -> float:
    return 1 - cosine(vec1, vec2)

# Example query and retrieved documents
query = "What are the benefits of Retrieval-Augmented Generation?"
documents = [
    "RAG improves answer accuracy by combining retrieval with generation.",
    "RAG uses embeddings to find relevant documents.",
    "Unrelated document about cooking recipes."
]

# Embed query
query_embedding = get_embedding(query)

# Embed documents
doc_embeddings = [get_embedding(doc) for doc in documents]

# Calculate similarity scores
similarities = [cosine_similarity(query_embedding, doc_emb) for doc_emb in doc_embeddings]

# Display relevancy scores
for i, score in enumerate(similarities):
    print(f"Document {i+1} relevancy: {score:.4f}")

output

Document 1 relevancy: 0.8723
Document 2 relevancy: 0.8457
Document 3 relevancy: 0.3121

Common variations

You can measure RAG context relevancy asynchronously using async calls or use other embedding models like claude-embedding-2. Streaming is not applicable for embeddings but you can batch embed multiple texts for efficiency.

python

import asyncio
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def get_embedding_async(text: str) -> list:
    response = await client.embeddings.acreate(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

async def main():
    query = "Explain RAG context relevancy"
    embedding = await get_embedding_async(query)
    print(f"Async embedding length: {len(embedding)}")

asyncio.run(main())

output

Async embedding length: 1536

Troubleshooting

If similarity scores are unexpectedly low, verify your texts are correctly embedded and normalized.
Ensure your OPENAI_API_KEY is valid and has embedding model access.
Check for network issues if API calls fail.
Use cosine similarity (1 - cosine distance) to get a proper relevancy score between 0 and 1.

✅

Key Takeaways

Use embedding models like text-embedding-3-small to vectorize queries and documents for relevancy measurement.
Calculate cosine similarity between query and document embeddings to quantify RAG context relevancy.
Batch embedding and async calls improve efficiency for large-scale RAG systems.
Validate API keys and network connectivity to avoid embedding failures.
Low similarity scores often indicate irrelevant or poorly matched context documents.

Verified 2026-04 · text-embedding-3-small

Verify ↗