How to Beginner · 3 min read

What dimension are LLM embeddings

Quick answer
The dimension of LLM embeddings varies by model but commonly ranges from 768 to 12288 dimensions. For example, OpenAI's text-embedding-3-large uses 1536 dimensions, while models like llama-3.1-70b have embeddings with 4096 or more dimensions. Embedding size depends on the model architecture and training design.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0

Typical embedding dimensions

Embedding dimensions represent the length of the vector that encodes text semantics. Common sizes include:

  • 768 dimensions: Early transformer models like BERT-base.
  • 1024 to 1536 dimensions: OpenAI's text-embedding-3-large uses 1536.
  • 4096+ dimensions: Larger models like llama-3.1-70b or GPT-4 variants.

Higher dimensions can capture more nuanced semantic information but increase computational cost.

ModelEmbedding Dimension
BERT-base768
OpenAI text-embedding-3-large1536
LLaMA 3.1 70B4096+
GPT-4o8192 (approximate)

How to check embedding dimension via API

You can retrieve embeddings and check their dimension programmatically using the OpenAI API. The length of the returned embedding vector is the dimension.

python
from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Example text to embed"
)

embedding_vector = response.data[0].embedding
print(f"Embedding dimension: {len(embedding_vector)}")
output
Embedding dimension: 1536

Common variations and considerations

Embedding dimensions vary by:

  • Model architecture: Larger models tend to have higher dimensions.
  • Use case: Some embeddings are optimized for search (dense vectors), others for classification.
  • API provider: OpenAI, Anthropic, and others have different embedding sizes.

Always check the model documentation or test directly to confirm embedding size.

Troubleshooting embedding dimension issues

If you get unexpected embedding sizes:

  • Verify you are using the correct model name.
  • Check if the API response includes multiple embeddings (batch input) and inspect one vector.
  • Ensure your client library is up to date to avoid deprecated model calls.

Key Takeaways

  • Embedding dimension is the length of the vector representing text semantics in an LLM.
  • Common embedding sizes range from 768 to over 4000 dimensions depending on the model.
  • You can programmatically check embedding dimension by measuring the returned vector length.
  • Embedding size impacts both semantic richness and computational cost.
  • Always verify embedding dimensions from official model docs or direct API calls.
Verified 2026-04 · text-embedding-3-large, llama-3.1-70b, gpt-4o
Verify ↗