API Intermediate medium · 6 min

What embeddings are: text as vectors

What you will learn

The OpenAI Embeddings API converts text into fixed-size numerical vectors that capture semantic meaning, enabling similarity search and semantic comparison.

Why this matters

Embeddings are the foundation for semantic search, recommendation systems, and retrieval-augmented generation (RAG). Without understanding how text becomes vectors, you can't build production search or clustering features that actually understand meaning rather than just keyword matching.

Skip if: Don't use embeddings if you only need exact keyword search (use traditional full-text indexing). Don't call the embeddings API for every single user query in real-time without caching: embed once, store, reuse. Don't use embeddings for real-time classification tasks where latency matters critically: use a smaller local model or a classifier endpoint instead.

Explanation

The OpenAI Embeddings API takes text (a word, sentence, or document) and returns a vector of floating-point numbers that represent that text's meaning in a mathematical space. The text-embedding-3-small model produces 1536-dimensional vectors; text-embedding-3-large produces 3072 dimensions. These aren't random: they're learned from billions of text examples so that semantically similar texts end up close to each other in that vector space.

Under the hood, the API sends your text to OpenAI's servers where a neural embedding model encodes it. The model has learned that "dog" and "puppy" should have similar vectors, while "dog" and "refrigerator" should be far apart. You can measure this distance using cosine similarity or Euclidean distance. This is why embeddings power semantic search: you embed the user's query and your documents once, then find the documents whose vectors are closest to the query vector.

Use embeddings when you need semantic understanding: search, clustering, duplicate detection, or feeding context into an LLM. Embed data once and store vectors in a vector database (Pinecone, Weaviate, Postgres with pgvector). Never embed the same text twice: it's wasteful and expensive. For a typical production system, you embed during data ingestion, then compare at query time.

Request code

python

from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

print(f"Embedding dimension: {len(response.data[0].embedding)}")
print(f"First 10 values: {response.data[0].embedding[:10]}")
print(f"Total tokens used: {response.usage.total_tokens}")

Authentication

The OpenAI SDK automatically reads your API key from the OPENAI_API_KEY environment variable when you instantiate the client. Ensure your key has access to the Embeddings API (it should by default). Set the key in your shell before running Python: `export OPENAI_API_KEY='sk-...'` or pass it directly: `client = OpenAI(api_key='sk-...')`.

Response shape

Field	Description
`object`	list (the response type identifier)
`data`	[object Object]
`model`	text-embedding-3-small
`usage`	[object Object]

Field guide

embedding

The vector itself: a list of floats. Store this in your vector database. This is your semantic fingerprint of the text.

index

Position of this embedding in your input batch (0 if you sent one text, 1 if you sent the second item in a batch of 10). Critical when batching: tells you which embedding corresponds to which input.

model

Confirms which embedding model was used. Always verify this matches what you requested.

usage.prompt_tokens

The number of tokens your input text consumed. OpenAI charges per token, so track this. A typical sentence uses 5-15 tokens.

Setup trap

If you set `os.environ['OPENAI_API_KEY']` after already instantiating `OpenAI()`, it won't work. The SDK reads the environment variable at client initialization time. Initialize the client *after* your environment is set up, or pass the key directly to the constructor.

Cost

As of April 2026, text-embedding-3-small costs $0.02 per 1M input tokens. text-embedding-3-large costs $0.13 per 1M input tokens. A typical document of 500 words is ~750 tokens. Embedding 1 million documents with the small model costs ~$15. Cache your embeddings: don't re-embed the same text.

Rate limits

OpenAI allows up to 3,000 requests per minute on embeddings endpoints (Pro tier gets higher limits). If you're embedding millions of documents, use batch processing or space requests over time. The API handles batch `input` lists well: send up to 2,048 texts per request to reduce API calls.

Common gotcha

Developers often treat embeddings as deterministic within a version, but they're not guaranteed to be bitwise identical across API calls: tiny floating-point differences are normal and won't break cosine similarity. The real gotcha: batching. If you send `input=["text1", "text2", "text3"]`, you get one response with three embeddings in the `data` array. It's easy to index wrong and assign the wrong vector to the wrong text. Always verify `response.data[i].index == i`.

Error recovery

AuthenticationError

Your API key is invalid, expired, or not set. Verify `echo $OPENAI_API_KEY` prints your key. Check that your key hasn't been revoked in the OpenAI dashboard.

RateLimitError

You've exceeded the request rate limit. Implement exponential backoff: sleep(2^attempt_number) seconds before retrying. Batch multiple texts into one request instead of making many requests.

APIConnectionError

Network issue. Verify internet connectivity. The SDK will retry automatically up to 3 times by default.

InvalidRequestError with 'model'

You specified a model that doesn't exist or you don't have access to. Use 'text-embedding-3-small' or 'text-embedding-3-large'. Older models like 'text-davinci-003' are not embedding models.

Experienced dev note

The biggest mistake senior devs make: storing embeddings without tracking the model version. Six months later, you have 10 million embeddings from text-embedding-3-small, your company switches to a different embedding model for better quality, and now you have to re-embed everything. Store the model name and embedding creation timestamp with every vector. Also, cosine similarity ranges from -1 to 1, but OpenAI's embeddings are normalized (most values near 0), so similarity scores are typically 0.7-0.95 for real-world text. Don't use raw Euclidean distance on these: always use cosine similarity.

Check your understanding

You have 100,000 product descriptions you want to make searchable by meaning. Should you call the embeddings API once per description during data ingestion, or call it for every search query? Why?

Show answer hint

Think about cost and latency. The API charges per token every time you call it. If you embed during ingestion and store vectors, searches are free (just vector math). If you embed on every query, you pay for every single search.

VERSION The openai 1.x SDK uses client.embeddings.create() (not the deprecated openai.Embedding.create() from 0.x). The text-embedding-3 models were released November 2024. Always pin your model explicitly: don't rely on defaults which may change.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.