High severity intermediate · Fix: 5-15 min

ValueError: space mismatch

chromadb.errors.InvalidQueryException or ValueError (HNSW space dimension/metric mismatch)

What this error means

ChromaDB raises a space mismatch error when your collection's HNSW index was built with one metric (e.g., cosine) but you query it with a different metric (e.g., euclidean), or when embedding dimensions don't match between index creation and query time.

Stack trace

traceback

chromadb.errors.InvalidQueryException: Error: space type mismatch. Index was created with space 'cosine', but query attempted with space 'euclidean'

Or:

ValueError: Embedding dimension mismatch: expected 1536 dimensions, got 768. HNSW index built with 1536-dim embeddings cannot query with 768-dim embeddings.

Traceback (most recent call last):
  File "app.py", line 45, in query_documents
    results = collection.query(query_embeddings=query_vec, n_results=5)
  File "chromadb/api/client.py", line 234, in query
    return self._client._query(self._name, query_embeddings, n_results, where_filter)
  File "chromadb/db/impl.py", line 187, in _query
    raise InvalidQueryException(f'space type mismatch. Index was created with space {self.metadata["space"]}, but query attempted with space {space}')

QUICK FIX

Verify collection.metadata()['space'] matches your query code's space parameter, ensure all embeddings have identical dimensions, and use space='cosine' consistently across all collection operations.

Why it happens

ChromaDB's HNSW (Hierarchical Navigable Small World) index is a metric-specific data structure. When you create a collection, you specify a space (cosine, euclidean, or ip). The entire index graph is built using that metric to compute distances between vectors. If you later query with embeddings from a different model (different dimensions) or if your collection's metadata says 'cosine' but your add/query code uses 'euclidean', HNSW cannot compute distances correctly and raises this error. Additionally, embedding model changes (e.g., switching from 1536-dim GPT embeddings to 768-dim alternatives) cause dimension mismatches that HNSW detects and rejects.

Detection

Before querying, log your collection's metadata to verify the space setting. Print the embedding dimension of your query vectors before passing them to collection.query(). Add assertions that embedding dimensions match your index creation step: `assert len(query_vec[0]) == expected_dim, f'Expected {expected_dim} dims, got {len(query_vec[0])}'`. Monitor ChromaDB's collection.metadata() to catch silent space mismatches early.

Causes & fixes

Collection created with space='cosine' but code queries with space='euclidean' (or vice versa)

✓ Fix

Ensure all collection.add(), collection.query(), and collection creation calls use the SAME space parameter. Set space='cosine' (default and recommended for embeddings) consistently. Store the space choice in environment variables or config files to prevent manual mismatches.

Embedding model changed (e.g., from OpenAI text-embedding-3-large 1536-dim to 768-dim), but index built with old dimensions

✓ Fix

Recreate the collection with a new name using delete_collection() then create_collection() with embeddings from the new model. Never try to add new 768-dim embeddings to an index built with 1536-dim data. Validate embedding dimension before any add/query: `assert all(len(e) == 1536 for e in embeddings)`

Using different embedding functions at different code stages (e.g., creating with ChromaDB's default embedding, querying with custom embedding function)

✓ Fix

Always pass the same embedding_function to both collection.get() and to chroma_client.create_collection(). If using custom embeddings, instantiate the function once and reuse it: `embedder = OpenAIEmbeddingFunction(model_name='text-embedding-3-small'); client.create_collection(embeddings=embedder, space='cosine')`

Metadata corruption or old ChromaDB version stored space='ip' (inner product) but collection actually uses cosine distance

✓ Fix

Check collection.metadata() output. If space field is missing or wrong, backup your data, drop the collection, and recreate it: `collection = client.delete_collection(name='my_collection'); collection = client.create_collection(name='my_collection', space='cosine', embeddings=embedder)`

Code: broken vs fixed

Broken - triggers the error

python

import chromadb
import os
from openai import OpenAI

# Collection created with space='cosine'
client = chromadb.Client()
collection = client.create_collection(
    name='documents',
    space='cosine'  # ← Index built with cosine metric
)

# Add embeddings
openai_client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
embeddings = openai_client.embeddings.create(
    input=['hello world'],
    model='text-embedding-3-small'
)
collection.add(
    ids=['doc1'],
    embeddings=[embeddings.data[0].embedding],
    documents=['hello world']
)

# Query with DIFFERENT space parameter — this breaks
query_vec = openai_client.embeddings.create(
    input=['hello'],
    model='text-embedding-3-small'
)

# ❌ BUG: querying with 'euclidean' but index built with 'cosine'
results = collection.query(
    query_embeddings=[query_vec.data[0].embedding],
    n_results=3,
    space='euclidean'  # ← MISMATCH — raises error
)
print(results)

Fixed - works correctly

python

import chromadb
import os
from openai import OpenAI

# Collection created with space='cosine'
client = chromadb.Client()
collection = client.create_collection(
    name='documents',
    space='cosine'  # ← Define metric once
)

# Add embeddings
openai_client = OpenAI(api_key=os.environ.get('OPENAI_API_KEY'))
embeddings = openai_client.embeddings.create(
    input=['hello world'],
    model='text-embedding-3-small'
)

# Verify embedding dimension
expected_dim = 1536
assert len(embeddings.data[0].embedding) == expected_dim, f'Expected {expected_dim} dims'

collection.add(
    ids=['doc1'],
    embeddings=[embeddings.data[0].embedding],
    documents=['hello world']
)

# Query with SAME space parameter
query_vec = openai_client.embeddings.create(
    input=['hello'],
    model='text-embedding-3-small'
)

# Verify space matches
print(f"Collection space: {collection.metadata()['space']}")
assert len(query_vec.data[0].embedding) == expected_dim, f'Query dim mismatch'

# ✅ FIXED: use 'cosine' consistently (or omit space parameter to use default)
results = collection.query(
    query_embeddings=[query_vec.data[0].embedding],
    n_results=3
    # space='cosine'  # ← Optional: already set at collection creation
)

print(f"Found {len(results['ids'][0])} results")
for doc_id, doc_text in zip(results['ids'][0], results['documents'][0]):
    print(f"{doc_id}: {doc_text}")

The fix removes the space='euclidean' parameter from query() and ensures all embeddings use the same model (text-embedding-3-small, 1536 dims). By omitting space in query(), ChromaDB uses the collection's default space (cosine), eliminating mismatch. Dimension assertions catch embedding model changes early.

⚠

Workaround

If you cannot recreate the collection immediately, extract the vector data, embeddings, and metadata from the old index using collection.get(), delete the collection, create a new one with correct space='cosine', and re-add all data. Alternatively, keep two collections: one for each metric (documents_cosine and documents_euclidean), and route queries to the correct collection based on your query embedding model. This is temporary: plan a full migration to a consistent space metric within 1-2 sprints.

✓

Prevention

Store your space metric choice in environment variables (CHROMA_SPACE=cosine) or a config file, not as hardcoded strings in code. Create a wrapper function that instantiates collections with validated parameters: `def create_doc_collection(client, space=os.environ.get('CHROMA_SPACE', 'cosine')):`. Use a single, immutable embedding model across your pipeline: switching models requires a full collection rebuild, not just a code change. Add pre-query validation: check that embeddings and collection space agree before calling query(). Use ChromaDB's backup/restore (collection.export()) before any schema changes.

Python 3.9+ · chromadb >=0.3.0 · tested on 0.4.x

Verified 2026-04

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.