API Intermediate medium · 6 min

Vector dimensions and what they mean

What you will learn

OpenAI's embedding API returns fixed-size vectors (1536 dimensions for text-embedding-3-small) where each dimension represents a learned semantic feature, and understanding what they encode helps you build better search and similarity systems.

Why this matters

When you build semantic search, recommendations, or clustering, you're working with embedding vectors. Knowing what those 1536 numbers actually represent: and how to interpret them: lets you debug why similarity comparisons fail, choose the right embedding model, and optimize storage/performance tradeoffs.

Skip if: Use exact keyword matching or full-text search when your queries are simple, deterministic, and don't need semantic understanding. Don't use embeddings for one-off lookups where latency matters more than relevance: a cached lookup table is faster. Don't re-embed the same text repeatedly; store embeddings in a vector database.

Explanation

OpenAI's embedding models (like text-embedding-3-small) convert text into fixed-length vectors of real numbers. text-embedding-3-small produces 1536-dimensional vectors, while text-embedding-3-large produces 3072 dimensions. Each of those numbers is a learned weight that captures some semantic or syntactic property of the input text.

Under the hood, these numbers don't have human-readable labels like "dimension 47 = sentiment." Instead, the embedding model learned them during training to solve prediction tasks. A dimension might encode concepts like "is this about business?" or "formality level" or "technical jargon density," but the model discovered these patterns itself, not via explicit labeling. The critical insight: higher dimensions capture more nuance, but cost more to store and compute distance on. 1536 dimensions is OpenAI's sweet spot for capturing most semantic relationships without excessive overhead.

Use embeddings when you need semantic similarity: finding the most relevant documents for a query, detecting duplicate content, or grouping related items. The vector representation lets you compute distance (cosine similarity, Euclidean distance) between any two pieces of text instantly. Don't use them for exact matching, because "the cat sat" and "a cat was sitting" are semantically close but lexically different: that's the entire point.

Request code

python

from openai import OpenAI
import json

client = OpenAI()

text_samples = [
    "The quick brown fox jumps over the lazy dog",
    "A fast auburn fox leaps across a sleeping hound",
    "Machine learning models require large datasets"
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=text_samples
)

print(f"Number of embeddings: {len(response.data)}")
print(f"Dimensions per embedding: {len(response.data[0].embedding)}")
print(f"First 10 values of first embedding: {response.data[0].embedding[:10]}")

import math

def cosine_similarity(vec_a, vec_b):
    dot_product = sum(a * b for a, b in zip(vec_a, vec_b))
    magnitude_a = math.sqrt(sum(a ** 2 for a in vec_a))
    magnitude_b = math.sqrt(sum(b ** 2 for b in vec_b))
    return dot_product / (magnitude_a * magnitude_b)

sim_0_1 = cosine_similarity(response.data[0].embedding, response.data[1].embedding)
sim_0_2 = cosine_similarity(response.data[0].embedding, response.data[2].embedding)

print(f"\nSimilarity between sentences 0 and 1 (both about foxes): {sim_0_1:.4f}")
print(f"Similarity between sentences 0 and 2 (different topics): {sim_0_2:.4f}")

Authentication

Set your OpenAI API key as an environment variable before running code: `export OPENAI_API_KEY='sk-...'`. The OpenAI SDK reads this automatically when you instantiate the client.

Response shape

Field	Description
`data`	List of embedding objects
`data[0].embedding`	List of 1536 floats for text-embedding-3-small
`data[0].index`	Integer position in input array (0, 1, 2, ...)
`model`	String: the model you requested
`usage.prompt_tokens`	Integer: tokens used from your input texts
`usage.total_tokens`	Integer: same as prompt_tokens for embeddings (no completion tokens)

Field guide

embedding

The actual vector: a Python list of floats. This is what you store in your vector database or use for similarity calculations.

index

Critical when batch-embedding: tells you which input text produced which embedding. With shuffled batches or retries, this index is your anchor.

prompt_tokens

Embeddings charge per token, not per text. A 500-word document costs more tokens than a short sentence. Track this to estimate costs accurately.

Setup trap

When you call `OpenAI()` without arguments, the SDK reads `OPENAI_API_KEY` from your environment at that exact moment. If you set `os.environ['OPENAI_API_KEY'] = '...'` after creating the client, it won't work. Initialize your client after setting environment variables, not before.

Cost

Embeddings cost $0.02 per 1M input tokens for text-embedding-3-small. A 100-word text is roughly 130 tokens. If you're building a search index over 10M documents averaging 200 words each, expect ~$260 in embedding costs. Batch requests: pass all texts in one API call rather than looping: to minimize overhead.

Rate limits

The free tier allows 3,500 requests per minute. If you're batch-embedding large datasets, space requests across time or use a queue. Standard tier allows much higher limits; check your current tier in the OpenAI dashboard.

Common gotcha

Developers assume all 1536 dimensions are equally important and try to compress or truncate embeddings ("I'll just use the first 768 dimensions to save space"). This breaks similarity calculations catastrophically. Dimensions are not ranked by importance: they're interdependent. Always use the full vector or use OpenAI's dimension reduction feature (specify `dimensions` parameter) which uses learned projection, not truncation.

Error recovery

AuthenticationError

Your API key is missing, invalid, or expired. Verify `echo $OPENAI_API_KEY` shows a key starting with 'sk-'. Regenerate the key in the OpenAI dashboard if needed.

RateLimitError

You've exceeded requests per minute. Implement exponential backoff: wait 1s, retry; wait 2s, retry; wait 4s, retry. Use the `tenacity` library for this.

InvalidRequestError (context_length_exceeded)

A single text you're embedding is too long (>8191 tokens for text-embedding-3-small). Truncate inputs before sending: `text[:2000]` as a rough limit.

Experienced dev note

Don't embed on-the-fly during queries. Pre-compute embeddings and store them in a vector database (Pinecone, Weaviate, Supabase pgvector). At runtime, embed the user's query once and search the pre-computed vectors. This saves 100x in latency and API costs. Also: cosine similarity is the right distance metric for embeddings; Euclidean distance wastes computation and performs worse.

Check your understanding

You have two documents: one about machine learning, one about cooking recipes. You embed both with text-embedding-3-small and compute cosine similarity: 0.32. Your colleague says 'that's low, so they're unrelated.' What would you tell them about why that reasoning is incomplete?

Show answer hint

Cosine similarity ranges 0–1 for normalized embeddings, but the scale is model-dependent and domain-dependent. 0.32 is actually moderately related for unrelated topics. You need a threshold derived from your specific data (typically 0.7–0.85 for 'related'), not a universal rule. Also, similarity scores are relative: what matters is whether one document ranks higher than another, not the absolute value.

VERSION openai 1.x SDK uses `client.embeddings.create()` not `openai.Embedding.create()`. The `dimensions` parameter was added in September 2024; older code won't have it. Verify your version with `pip show openai`: should be >= 1.3.0 to access full text-embedding-3 models.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.