Concept Beginner to Intermediate · 3 min read

What are embeddings in AI

Quick answer
In AI, embeddings are numerical vector representations of data such as text, images, or audio that capture semantic relationships. They enable machines to understand and compare complex inputs by converting them into fixed-size vectors suitable for similarity search, classification, or clustering.
Embeddings are vector representations that convert complex data into numerical form capturing semantic meaning for AI models to process and compare.

How it works

Embeddings transform data like words or images into fixed-length vectors in a high-dimensional space, where similar items are closer together. Imagine a map where cities represent words; cities close to each other share similar characteristics. AI models learn these vectors by training on large datasets, capturing context and meaning beyond simple keywords.

Concrete example

Here is a Python example using OpenAI's API to generate text embeddings for two sentences and compute their cosine similarity to measure semantic closeness.

python
import os
from openai import OpenAI
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

sentences = [
    "The cat sits on the mat.",
    "A feline is resting on a rug."
]

embeddings = []
for sentence in sentences:
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=sentence
    )
    embeddings.append(response.data[0].embedding)

similarity = cosine_similarity(np.array(embeddings[0]), np.array(embeddings[1]))
print(f"Cosine similarity: {similarity:.4f}")
output
Cosine similarity: 0.8723

When to use it

Use embeddings when you need to compare or search complex data by meaning rather than exact matches, such as semantic search, recommendation systems, clustering, or classification. Avoid embeddings for tasks requiring exact symbolic logic or precise numeric calculations where vector similarity is insufficient.

Key terms

TermDefinition
EmbeddingA fixed-length vector representing data capturing semantic meaning.
Vector spaceA mathematical space where embeddings are positioned based on similarity.
Cosine similarityA metric measuring the angle between two vectors to assess similarity.
Semantic similarityHow close two pieces of data are in meaning, not just exact words.

Key Takeaways

  • Embeddings convert complex data into numerical vectors that capture semantic meaning.
  • Use embeddings for semantic search, recommendations, and clustering tasks.
  • Cosine similarity is a common method to compare embeddings for relatedness.
  • Embeddings enable AI to understand context beyond exact keyword matching.
Verified 2026-04 · text-embedding-3-large, gpt-4o, claude-3-5-sonnet-20241022
Verify ↗