Embeddings Cheat Sheet — Vector, Dimensions & Distance
Embeddings transform words into numerical coordinates in semantic space.
Like latitude/longitude coordinates for words: "cat" and "kitten" are close neighbors in embedding space, while "cat" and "engine" are far apart.
Core Concepts
Embedding Models at a Glance
| Model | Provider | Dimensions | Max Tokens | Cost (per 1M tokens) | Best For |
|---|---|---|---|---|---|
| text-embedding-3-large | OpenAI | 3072 | 8191 | $0.13 | High accuracy semantic search |
| text-embedding-3-small | OpenAI | 1536 | 8191 | $0.02 | Fast, cheap general search |
| multilingual-e5-large | HuggingFace | 1024 | 512 | Free (self-hosted) | Multi-language retrieval |
| all-MiniLM-L6-v2 | HuggingFace | 384 | 512 | Free (self-hosted) | Low-latency, small models |
Production Patterns
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.embeddings.create(
model="text-embedding-3-large",
input="The quick brown fox"
)
vector = response.data[0].embedding
print(f"Dimension: {len(vector)}") # 3072
print(f"First 5 values: {vector[:5]}") Dimension: 3072
First 5 values: [0.001234, -0.00567, 0.00891, ...] from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
sentences = ["The quick brown fox", "A fast brown animal"]
embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}") # (2, 384)
print(f"Cosine similarity: {embeddings[0] @ embeddings[1] / (len(embeddings[0]))**0.5}") Shape: (2, 384)
Cosine similarity: 0.87 from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Batch request (max 100k embeddings per call)
texts = ["doc1", "doc2", "doc3"] # Up to 100k
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
for i, data in enumerate(response.data):
print(f"Text {i}: {len(data.embedding)} dims") Text 0: 1536 dims
Text 1: 1536 dims
Text 2: 1536 dims import numpy as np
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Embed query and documents
query = "cat behavior"
docs = ["Cats sleep 16 hours", "Dogs bark loudly", "Felines are predators"]
query_emb = client.embeddings.create(
model="text-embedding-3-small", input=query
).data[0].embedding
doc_embs = client.embeddings.create(
model="text-embedding-3-small", input=docs
).data[0].embedding # Wrong! Use for loop
# Correct: embed all docs
doc_embs = [client.embeddings.create(
model="text-embedding-3-small", input=doc
).data[0].embedding for doc in docs]
# Cosine similarity
query_emb = np.array(query_emb)
similarities = [np.dot(query_emb, np.array(de)) / (np.linalg.norm(query_emb) * np.linalg.norm(np.array(de))) for de in doc_embs]
top_idx = np.argsort(similarities)[-1]
print(f"Most similar: {docs[top_idx]} (score: {similarities[top_idx]:.3f})") Most similar: Felines are predators (score: 0.892) from openai import OpenAI
import numpy as np
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.embeddings.create(
model="text-embedding-3-large",
input="The quick brown fox",
dimensions=256 # Reduce from 3072 → 256
)
vector = response.data[0].embedding
print(f"Reduced dimension: {len(vector)}") # 256 Reduced dimension: 256 OpenAI Embeddings API Parameters
text-embedding-3-large, text-embedding-3-small
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | text-embedding-3-large or text-embedding-3-small |
input | string | list | required | Text(s) to embed. List up to 100k items per call. |
dimensions | int | null (full) | Reduce output to N dimensions (text-embedding-3-* only). Min 1. |
encoding_format | string | float | float or base64 (base64 saves bandwidth for large batches) |
Common Errors & Fixes
RateLimitError: Rate limit exceeded Cause: Sending too many requests per minute. OpenAI has token/minute and request/minute limits.
Add exponential backoff retry logic:
import time
from openai import RateLimitError, OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def embed_with_retry(text, max_retries=3):
for attempt in range(max_retries):
try:
return client.embeddings.create(
model="text-embedding-3-small",
input=text
).data[0].embedding
except RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
raise Exception("Failed after retries") ValueError: Token limit exceeded Cause: Input text longer than model's max_tokens (text-embedding-3: 8191 tokens).
Truncate or chunk input before embedding:
import tiktoken
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
encoding = tiktoken.encoding_for_model("text-embedding-3-small")
text = "Very long document..." * 1000
tokens = encoding.encode(text)
truncated_tokens = tokens[:8191]
truncated_text = encoding.decode(truncated_tokens)
response = client.embeddings.create(
model="text-embedding-3-small",
input=truncated_text
) AuthenticationError: Incorrect API key Cause: OPENAI_API_KEY not set or invalid.
Verify environment variable:
import os
print("Key set:", "OPENAI_API_KEY" in os.environ)
print("Key starts with:", os.environ.get("OPENAI_API_KEY", "")[:10])
# Set in .env:
# OPENAI_API_KEY=sk-... Cosine similarity returns NaN or Inf Cause: Zero-magnitude vector or numerical instability in normalization.
Add numerical stability check:
import numpy as np
def safe_cosine_similarity(a, b):
a = np.array(a)
b = np.array(b)
norm_a = np.linalg.norm(a)
norm_b = np.linalg.norm(b)
if norm_a == 0 or norm_b == 0:
return 0.0
return np.dot(a, b) / (norm_a * norm_b) Production Gotchas
OpenAI quietly updates embedding models. Same text may have slightly different vectors after an update. Store model name + version with vectors. Pin to specific versions (e.g., text-embedding-3-large-20240416) if consistency is critical.
Higher dimensions don't always mean better results. text-embedding-3-large (3072) vs small (1536) shows diminishing returns. Profile latency and quality for your domain. Often 256–512 dims are sufficient after dimension reduction.
OpenAI charges per 1M tokens input. Embedding 10k docs at 100 tokens each = 1M tokens = $0.02–0.13 per run. Batch and reuse embeddings aggressively. Don't re-embed identical text.
Cosine similarity ranges 0–1 (0 = orthogonal, 1 = identical). But 0.7 similarity can mean completely different things across domains. Test and establish thresholds empirically. Use 0.7 as a starting point, not a rule.
English-optimized models (OpenAI, most HuggingFace) perform worse on other languages. Use multilingual models (e.g., multilingual-e5-large) if you need non-English support. Mixing languages in one embedding degrades performance.
Complete RAG Example: Embed Documents & Query
"""End-to-end semantic search with embeddings."""
from openai import OpenAI
import numpy as np
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Step 1: Embed documents (typically done offline)
documents = [
"Python is a programming language",
"Embeddings represent text as vectors",
"Machine learning requires large datasets",
"Cosine similarity measures vector closeness"
]
embed_response = client.embeddings.create(
model="text-embedding-3-small",
input=documents
)
doc_embeddings = {
doc: embed_response.data[i].embedding
for i, doc in enumerate(documents)
}
print(f"Embedded {len(doc_embeddings)} documents")
# Step 2: User query
query = "What is an embedding?"
query_embedding = client.embeddings.create(
model="text-embedding-3-small",
input=query
).data[0].embedding
# Step 3: Find most similar document
def cosine_similarity(a, b):
a = np.array(a)
b = np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
scores = [
(doc, cosine_similarity(query_embedding, emb))
for doc, emb in doc_embeddings.items()
]
scores.sort(key=lambda x: x[1], reverse=True)
print(f"\nQuery: {query}")
print(f"Top result: {scores[0][0]} (similarity: {scores[0][1]:.3f})")