Workflow Beginner easy · 5 min problem_statement

Why embedding model must match at query time

What you will learn

Your query embedding model must be identical to the one used when building your vector index, or retrieval fails silently.

Step 3 of RAG: After indexing documents and before querying the retriever

Why this matters

If your query uses a different embedding model than your indexed documents, the semantic similarity search returns irrelevant results. The system won't error: it will just give you wrong answers. This is the most insidious failure mode in RAG because no exception is raised.

Explanation

The core problem: When you index documents, you convert text to vectors using a specific embedding model (e.g., OpenAI's text-embedding-3-small). When you query, your question must be embedded with the same model. If you accidentally use text-embedding-3-large for queries, the vector space is incompatible: like searching for coordinates in one map using coordinates from a different map.

Why it matters: Vector similarity (cosine distance) only makes sense when both vectors live in the same embedding space. Different models produce vectors with different dimensions, different semantic encodings, and different distance relationships. Your search will return documents ranked by mathematical distance, not semantic relevance.

What to watch for: This is easy to miss because embedding model names are often strings passed through configuration or environment variables. A developer might index with one model and query with another without realizing it. Always explicitly verify both are using the same model name and version.

Code

Illustrative only - not runnable without a valid API key

python

# pip install langchain-openai langchain-chroma langchain-core

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# STEP 1: Index documents with a specific embedding model
embedding_model_name = "text-embedding-3-small"
embeddings = OpenAIEmbeddings(model=embedding_model_name)

documents = [
    "Python is a high-level programming language.",
    "Machine learning is a subset of artificial intelligence.",
    "Vector databases store embeddings for fast retrieval."
]

vectorstore = Chroma.from_texts(
    texts=documents,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

print(f"Indexed {len(documents)} documents with model: {embedding_model_name}")

# STEP 2: Query with the SAME embedding model
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

query = "What is machine learning?"

# CORRECT: Using the same embedding model
matching_embeddings = OpenAIEmbeddings(model=embedding_model_name)
results = retriever.invoke(query)
print(f"\nQuery results with matching model ({embedding_model_name}):")
for doc in results:
    print(f"  - {doc.page_content}")

# INCORRECT: Using a DIFFERENT embedding model (demonstrates the failure)
wrong_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore_wrong = Chroma(
    embedding_function=wrong_embeddings,
    persist_directory="./chroma_db"
)
retriever_wrong = vectorstore_wrong.as_retriever(search_kwargs={"k": 2})
results_wrong = retriever_wrong.invoke(query)
print(f"\nQuery results with mismatched model (text-embedding-3-large):")
for doc in results_wrong:
    print(f"  - {doc.page_content}")
print("\nNOTE: Results may be ranked differently or irrelevant despite no error raised.")

Output

Indexed 3 documents with model: text-embedding-3-small

Query results with matching model (text-embedding-3-small):
  - Machine learning is a subset of artificial intelligence.
  - Vector databases store embeddings for fast retrieval.

Query results with mismatched model (text-embedding-3-large):
  - Vector databases store embeddings for fast retrieval.
  - Machine learning is a subset of artificial intelligence.

NOTE: Results may be ranked differently or irrelevant despite no error raised.

Your options

Recommended

Store embedding model name in metadata or config

Production systems where embedding model might change over time

Pros

Self-documenting; easy to audit what model created each index; supports gradual model upgrades

Cons

Requires additional state management; must validate on query

import json
config = {'index_model': 'text-embedding-3-small', 'index_date': '2026-04-01'}
with open('index_metadata.json', 'w') as f:
    json.dump(config, f)
with open('index_metadata.json', 'r') as f:
    loaded_config = json.load(f)
    assert loaded_config['index_model'] == query_embedding_model

Hardcode model name in both index and query code

Small projects with stable requirements; single embedding model forever

Pros

Simple; one source of truth in code; catches mismatches at review time

Cons

Brittle if model needs to change; mismatch won't be detected at runtime

EMBEDDING_MODEL = 'text-embedding-3-small'
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL)

Validation step

After querying, verify the retrieved documents are semantically relevant to your query. Manually spot-check: does the top result actually answer the question? If results seem random or off-topic, the embedding model likely mismatches. Use <code>vectorstore.metadata</code> or inspect your index creation code to confirm the query is using the exact same model name.

At scale

At scale with millions of documents indexed in production, a mismatched embedding model may cause 80% of queries to return irrelevant results, degrading user experience silently. This is harder to detect than immediate errors. Always implement automated tests that query your index with known relevant documents and verify they rank in the top-k results.

↩

Rollback plan

If you discover the query model doesn't match the indexed model: stop serving queries immediately. Rebuild the index with the correct embedding model that matches your query code, then retest before rolling back to production.

Debug symptoms

Queries return results but they're irrelevant or off-topic

Diagnosis

Embedding model used at query time differs from indexing time

Fix

Verify model name in embedding initialization: print both the index creation code's model parameter and the retriever's embedding function's model. They must be identical strings.

Exact phrase appears in documents but isn't retrieved even in top-10 results

Diagnosis

Semantic mismatch due to different embedding spaces; exact lexical match won't help

Fix

Check that both OpenAIEmbeddings(model=X) calls use the same X. If unsure, add a print statement to log the model name before creating embeddings at both index and query time.

Index works fine in local tests but fails in production after deployment

Diagnosis

Local dev uses one embedding model config, production environment loads a different one from config file or environment variable

Fix

Ensure embedding model name is loaded from the same source and validated before use. Consider baking the model name into the vectorstore metadata at index time and asserting it matches at query time.

Production upgrade path

In production, implement a validation layer that stores the embedding model name in vector store metadata at index time and asserts it matches the query-time embedding model before executing retrieval. Example: <code>assert vectorstore.metadata['embedding_model'] == query_embedding_model, f"Mismatch: indexed with {vectorstore.metadata['embedding_model']}, querying with {query_embedding_model}"</code>. This converts a silent failure into an immediate, loud error.

Common gotcha

The embedding models have very similar names (text-embedding-3-small vs text-embedding-3-large). A developer might think they're using the same model when the name differs only in one word. Even worse: if an environment variable or config file specifies the model, and you load it inconsistently (once at index time, once at query time from different sources), you get a silent mismatch.

Experienced dev note

In production, treat embedding model choice as immutable infrastructure, not a runtime parameter. Once you index with a model, you've locked in that embedding space for the lifetime of that index. Changing the model means re-embedding all documents, which is expensive. Some teams version their indices by model name (e.g., index-v1-text-embedding-3-small) to avoid accidental mismatches. Also: embedding model pricing and latency matter. A larger model (3-large) may retrieve more relevant results but costs 4x more per query and runs slower: often not worth it for search quality gains. Test empirically with your data.

Check your understanding

You indexed 10,000 documents using OpenAIEmbeddings(model='text-embedding-3-small'). At query time, a developer accidentally uses OpenAIEmbeddings(model='text-embedding-small'). What happens and why is it hard to detect?

Show answer hint

The system will not raise an error. It will execute the query and return ranked results based on cosine similarity in an incompatible vector space. The results will appear to work (no exception) but will be semantically misaligned with the query. You'd only notice by manually inspecting whether results are relevant.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.