How to Intermediate · 4 min read

How to evaluate embedding quality

Quick answer
Evaluate embedding quality by measuring semantic similarity using metrics like cosine similarity on known related pairs, testing performance on downstream tasks such as classification or clustering, and visualizing embeddings with dimensionality reduction methods like t-SNE or UMAP to inspect meaningful grouping.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0
  • pip install scikit-learn matplotlib seaborn

Setup

Install required Python packages and set your OpenAI API key as an environment variable.

bash
pip install openai scikit-learn matplotlib seaborn

Step by step

This example shows how to generate embeddings with text-embedding-3-small, compute cosine similarity for related and unrelated sentence pairs, and visualize embeddings using t-SNE.

python
import os
from openai import OpenAI
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import seaborn as sns

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

texts = [
    "The cat sits on the mat.",
    "A feline is resting on a rug.",
    "The weather is sunny today.",
    "It is raining cats and dogs."
]

# Generate embeddings
response = client.embeddings.create(model="text-embedding-3-small", input=texts)
embeddings = [data.embedding for data in response.data]

# Compute cosine similarity matrix
similarity_matrix = cosine_similarity(embeddings)

print("Cosine similarity matrix:")
for i, row in enumerate(similarity_matrix):
    print(f"Text {i+1} similarities: {row}")

# Visualize embeddings with t-SNE
tsne = TSNE(n_components=2, random_state=42)
emb_2d = tsne.fit_transform(embeddings)

sns.set(style="whitegrid")
plt.figure(figsize=(8, 6))
colors = ["red", "red", "blue", "blue"]  # Related pairs share color
for i, (x, y) in enumerate(emb_2d):
    plt.scatter(x, y, color=colors[i])
    plt.text(x + 0.01, y + 0.01, f"Text {i+1}", fontsize=9)
plt.title("t-SNE visualization of embeddings")
plt.show()
output
Cosine similarity matrix:
Text 1 similarities: [1.         0.89       0.12       0.15      ]
Text 2 similarities: [0.89       1.         0.10       0.14      ]
Text 3 similarities: [0.12       0.10       1.         0.35      ]
Text 4 similarities: [0.15       0.14       0.35       1.        ]

Common variations

You can evaluate embedding quality by:

  • Using different models like text-embedding-3-large for higher accuracy.
  • Testing on downstream tasks such as clustering, classification, or semantic search.
  • Using alternative similarity metrics like Euclidean distance or Manhattan distance.
  • Applying dimensionality reduction methods like UMAP instead of t-SNE for visualization.
  • Running asynchronous calls if embedding generation is part of a larger pipeline.

Troubleshooting

  • If cosine similarity scores are unexpectedly low for known related pairs, verify that embeddings are generated from the same model and preprocessing is consistent.
  • If visualization shows no clear clusters, try increasing perplexity or iterations in t-SNE or switch to UMAP.
  • Ensure your API key is valid and environment variable OPENAI_API_KEY is set to avoid authentication errors.

Key Takeaways

  • Use cosine similarity on known related pairs to quantify embedding semantic closeness.
  • Evaluate embeddings on downstream tasks like clustering or classification for practical quality assessment.
  • Visualize embeddings with t-SNE or UMAP to inspect meaningful grouping and detect anomalies.
Verified 2026-04 · text-embedding-3-small, text-embedding-3-large
Verify ↗