ValueError
builtins.ValueError
Stack trace
ValueError: Embedding format mismatch: expected dense vector but received sparse format or vice versa
File "app.py", line 42, in generate_embedding
embedding = client.embeddings.create(input=text, model="text-embedding-3-large")
File "/usr/local/lib/python3.9/site-packages/openai/embeddings.py", line 58, in create
raise ValueError("Embedding format mismatch") Why it happens
Embedding APIs and libraries expect embeddings in a specific vector format, either dense (a list of floats) or sparse (a dictionary of indices and values). If the input or output format does not match the expected type, this error is raised. This often happens when mixing embedding providers or incorrectly parsing the embedding response.
Detection
Validate the embedding vector type immediately after generation by asserting its type and structure before downstream use. Log the raw embedding response to detect format mismatches early.
Causes & fixes
Using an embedding model or provider that returns sparse embeddings but code expects dense vectors
Check the model documentation and convert sparse embeddings to dense format before use, or switch to a dense embedding model.
Parsing the embedding response incorrectly, e.g., treating a dict as a list or vice versa
Parse the embedding response according to the API spec, ensuring the embedding vector is extracted as a list of floats for dense embeddings.
Mixing embedding vectors from different sources with incompatible formats in the same pipeline
Normalize all embeddings to a consistent format (dense or sparse) before combining or comparing them.
Code: broken vs fixed
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
text = "Example text to embed"
embedding = client.embeddings.create(input=text, model="text-embedding-3-large")
# This line triggers ValueError due to format mismatch
print(embedding['data'][0]['embedding'][0] + 1.0) from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
text = "Example text to embed"
response = client.embeddings.create(input=text, model="text-embedding-3-large")
embedding = response['data'][0]['embedding']
# Confirm embedding is a dense list of floats
if not isinstance(embedding, list):
raise ValueError("Expected dense embedding list but got different format")
print(embedding[0] + 1.0) # Works correctly now Workaround
Wrap embedding extraction in try/except ValueError, and if caught, attempt to convert sparse dict embeddings to dense lists by initializing zero vectors and filling indices.
Prevention
Standardize on a single embedding provider and model that returns dense vectors, and validate embedding formats immediately after retrieval to avoid downstream errors.