Concept Beginner · 3 min read

What is semantic similarity

Quick answer

Semantic similarity is a measure of how closely two pieces of text or data relate in meaning, computed using embeddings that represent their semantic content as vectors. It enables AI systems to compare concepts beyond exact word matching by calculating distances or angles between these vectors.

Semantic similarity is a measure that quantifies how alike two pieces of text or data are in meaning using vector representations called embeddings.

How it works

Semantic similarity works by converting text into embeddings, which are dense numerical vectors capturing the meaning of the text. These vectors live in a high-dimensional space where similar meanings are close together. By calculating the distance or angle (e.g., cosine similarity) between two embedding vectors, AI models can quantify how semantically close the texts are, even if they use different words.

Think of it like mapping cities on a globe: two cities close together are similar in location, just like two sentences close in embedding space are similar in meaning.

Concrete example

Using OpenAI embeddings, you can compute semantic similarity between two sentences by comparing their embedding vectors with cosine similarity.

python

from openai import OpenAI
import os
import numpy as np

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Get embeddings for two sentences
response1 = client.embeddings.create(model="text-embedding-3-small", input="I love machine learning.")
response2 = client.embeddings.create(model="text-embedding-3-small", input="Artificial intelligence is fascinating.")

vec1 = np.array(response1.data[0].embedding)
vec2 = np.array(response2.data[0].embedding)

# Compute cosine similarity
cos_sim = np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))
print(f"Semantic similarity: {cos_sim:.4f}")

output

Semantic similarity: 0.8723

When to use it

Use semantic similarity when you need to compare texts or data based on meaning rather than exact wording. Common use cases include:

Document search and retrieval where synonyms or paraphrases exist
Clustering or grouping similar content
Recommendation systems based on user preferences
Detecting duplicate or near-duplicate content

Do not rely on semantic similarity when exact string matching or strict syntax is required, such as passwords or code syntax validation.

Key terms

Term	Definition
Semantic similarity	A measure of how close two texts are in meaning using vector representations.
Embeddings	Numerical vector representations of text capturing semantic information.
Cosine similarity	A metric measuring the cosine of the angle between two vectors, indicating similarity.
Vector space	A mathematical space where embeddings are positioned based on semantic features.

✅

Key Takeaways

Semantic similarity uses embeddings to compare meaning beyond exact words.
Cosine similarity is a common method to quantify semantic closeness between vectors.
Use semantic similarity for search, clustering, and recommendations based on meaning.
Avoid semantic similarity for tasks requiring exact matches or syntax correctness.

Verified 2026-04 · text-embedding-3-small, gpt-4o, claude-3-5-sonnet-20241022

Verify ↗