Concept Beginner · 3 min read

What is semantic similarity in AI

Quick answer

Semantic similarity in AI is the measurement of how closely two pieces of text or data relate in meaning rather than exact wording. It uses techniques like vector embeddings from models such as gpt-4o or claude-3-5-sonnet-20241022 to quantify meaning similarity for tasks like search and clustering.

Semantic similarity is a measure in AI that quantifies how much two pieces of text or data share meaning, regardless of their exact wording.

How it works

Semantic similarity works by converting text or data into numerical vectors that capture their meaning in a high-dimensional space. Imagine each sentence as a point in a multi-dimensional map where closer points mean more similar meanings. Models like gpt-4o generate these embeddings by analyzing context, synonyms, and relationships beyond exact words, enabling AI to understand that "car" and "automobile" are closely related.

Concrete example

Here is a Python example using the OpenAI SDK to compute semantic similarity between two sentences by comparing their embeddings:

python

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

sentences = [
    "I love playing football.",
    "Soccer is my favorite sport."
]

# Get embeddings for each sentence
embeddings = []
for sentence in sentences:
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=sentence
    )
    embeddings.append(response.data[0].embedding)

# Compute cosine similarity
def cosine_similarity(vec1, vec2):
    import numpy as np
    vec1 = np.array(vec1)
    vec2 = np.array(vec2)
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

similarity_score = cosine_similarity(embeddings[0], embeddings[1])
print(f"Semantic similarity score: {similarity_score:.4f}")

output

Semantic similarity score: 0.8723

When to use it

Use semantic similarity when you need to compare meaning rather than exact text matches. It is essential for search engines, recommendation systems, clustering documents, detecting paraphrases, and question-answering systems. Avoid it when exact string matching or syntactic analysis is required, such as code syntax checking or strict data validation.

Key terms

Term	Definition
Semantic similarity	A metric that quantifies how much two texts share meaning.
Embedding	A numerical vector representing text or data in a high-dimensional space.
Cosine similarity	A measure of similarity between two vectors based on the cosine of the angle between them.
Vector space	A mathematical space where embeddings are represented as points or vectors.

✅

Key Takeaways

Semantic similarity captures meaning beyond exact words using vector embeddings.
Use semantic similarity for search, recommendations, and paraphrase detection.
Cosine similarity is a common method to quantify semantic similarity between embeddings.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022, text-embedding-3-large

Verify ↗