What dimension are LLM embeddings
Quick answer
The dimension of
LLM embeddings varies by model but commonly ranges from 768 to 12288 dimensions. For example, OpenAI's text-embedding-3-large uses 1536 dimensions, while models like llama-3.1-70b have embeddings with 4096 or more dimensions. Embedding size depends on the model architecture and training design.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0
Typical embedding dimensions
Embedding dimensions represent the length of the vector that encodes text semantics. Common sizes include:
- 768 dimensions: Early transformer models like BERT-base.
- 1024 to 1536 dimensions: OpenAI's
text-embedding-3-largeuses 1536. - 4096+ dimensions: Larger models like
llama-3.1-70bor GPT-4 variants.
Higher dimensions can capture more nuanced semantic information but increase computational cost.
| Model | Embedding Dimension |
|---|---|
| BERT-base | 768 |
| OpenAI text-embedding-3-large | 1536 |
| LLaMA 3.1 70B | 4096+ |
| GPT-4o | 8192 (approximate) |
How to check embedding dimension via API
You can retrieve embeddings and check their dimension programmatically using the OpenAI API. The length of the returned embedding vector is the dimension.
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.embeddings.create(
model="text-embedding-3-large",
input="Example text to embed"
)
embedding_vector = response.data[0].embedding
print(f"Embedding dimension: {len(embedding_vector)}") output
Embedding dimension: 1536
Common variations and considerations
Embedding dimensions vary by:
- Model architecture: Larger models tend to have higher dimensions.
- Use case: Some embeddings are optimized for search (dense vectors), others for classification.
- API provider: OpenAI, Anthropic, and others have different embedding sizes.
Always check the model documentation or test directly to confirm embedding size.
Troubleshooting embedding dimension issues
If you get unexpected embedding sizes:
- Verify you are using the correct model name.
- Check if the API response includes multiple embeddings (batch input) and inspect one vector.
- Ensure your client library is up to date to avoid deprecated model calls.
Key Takeaways
- Embedding dimension is the length of the vector representing text semantics in an LLM.
- Common embedding sizes range from 768 to over 4000 dimensions depending on the model.
- You can programmatically check embedding dimension by measuring the returned vector length.
- Embedding size impacts both semantic richness and computational cost.
- Always verify embedding dimensions from official model docs or direct API calls.