Why embedding model must match at query time
Why this matters
If your query uses a different embedding model than your indexed documents, the semantic similarity search returns irrelevant results. The system won't error: it will just give you wrong answers. This is the most insidious failure mode in RAG because no exception is raised.
Explanation
The core problem: When you index documents, you convert text to vectors using a specific embedding model (e.g., OpenAI's text-embedding-3-small). When you query, your question must be embedded with the same model. If you accidentally use text-embedding-3-large for queries, the vector space is incompatible: like searching for coordinates in one map using coordinates from a different map.
Why it matters: Vector similarity (cosine distance) only makes sense when both vectors live in the same embedding space. Different models produce vectors with different dimensions, different semantic encodings, and different distance relationships. Your search will return documents ranked by mathematical distance, not semantic relevance.
What to watch for: This is easy to miss because embedding model names are often strings passed through configuration or environment variables. A developer might index with one model and query with another without realizing it. Always explicitly verify both are using the same model name and version.
Code
# pip install langchain-openai langchain-chroma langchain-core
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
# STEP 1: Index documents with a specific embedding model
embedding_model_name = "text-embedding-3-small"
embeddings = OpenAIEmbeddings(model=embedding_model_name)
documents = [
"Python is a high-level programming language.",
"Machine learning is a subset of artificial intelligence.",
"Vector databases store embeddings for fast retrieval."
]
vectorstore = Chroma.from_texts(
texts=documents,
embedding=embeddings,
persist_directory="./chroma_db"
)
print(f"Indexed {len(documents)} documents with model: {embedding_model_name}")
# STEP 2: Query with the SAME embedding model
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
query = "What is machine learning?"
# CORRECT: Using the same embedding model
matching_embeddings = OpenAIEmbeddings(model=embedding_model_name)
results = retriever.invoke(query)
print(f"\nQuery results with matching model ({embedding_model_name}):")
for doc in results:
print(f" - {doc.page_content}")
# INCORRECT: Using a DIFFERENT embedding model (demonstrates the failure)
wrong_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore_wrong = Chroma(
embedding_function=wrong_embeddings,
persist_directory="./chroma_db"
)
retriever_wrong = vectorstore_wrong.as_retriever(search_kwargs={"k": 2})
results_wrong = retriever_wrong.invoke(query)
print(f"\nQuery results with mismatched model (text-embedding-3-large):")
for doc in results_wrong:
print(f" - {doc.page_content}")
print("\nNOTE: Results may be ranked differently or irrelevant despite no error raised.") Indexed 3 documents with model: text-embedding-3-small Query results with matching model (text-embedding-3-small): - Machine learning is a subset of artificial intelligence. - Vector databases store embeddings for fast retrieval. Query results with mismatched model (text-embedding-3-large): - Vector databases store embeddings for fast retrieval. - Machine learning is a subset of artificial intelligence. NOTE: Results may be ranked differently or irrelevant despite no error raised.
Your options
Store embedding model name in metadata or config
Production systems where embedding model might change over time
Pros
Self-documenting; easy to audit what model created each index; supports gradual model upgrades
Cons
Requires additional state management; must validate on query
import json
config = {'index_model': 'text-embedding-3-small', 'index_date': '2026-04-01'}
with open('index_metadata.json', 'w') as f:
json.dump(config, f)
with open('index_metadata.json', 'r') as f:
loaded_config = json.load(f)
assert loaded_config['index_model'] == query_embedding_model Hardcode model name in both index and query code
Small projects with stable requirements; single embedding model forever
Pros
Simple; one source of truth in code; catches mismatches at review time
Cons
Brittle if model needs to change; mismatch won't be detected at runtime
EMBEDDING_MODEL = 'text-embedding-3-small'
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model=EMBEDDING_MODEL) Validation step
After querying, verify the retrieved documents are semantically relevant to your query. Manually spot-check: does the top result actually answer the question? If results seem random or off-topic, the embedding model likely mismatches. Use <code>vectorstore.metadata</code> or inspect your index creation code to confirm the query is using the exact same model name.
At scale
At scale with millions of documents indexed in production, a mismatched embedding model may cause 80% of queries to return irrelevant results, degrading user experience silently. This is harder to detect than immediate errors. Always implement automated tests that query your index with known relevant documents and verify they rank in the top-k results.
Rollback plan
If you discover the query model doesn't match the indexed model: stop serving queries immediately. Rebuild the index with the correct embedding model that matches your query code, then retest before rolling back to production.
Debug symptoms
Queries return results but they're irrelevant or off-topic
Diagnosis
Embedding model used at query time differs from indexing time
Fix
Verify model name in embedding initialization: print both the index creation code's model parameter and the retriever's embedding function's model. They must be identical strings.
Exact phrase appears in documents but isn't retrieved even in top-10 results
Diagnosis
Semantic mismatch due to different embedding spaces; exact lexical match won't help
Fix
Check that both OpenAIEmbeddings(model=X) calls use the same X. If unsure, add a print statement to log the model name before creating embeddings at both index and query time.
Index works fine in local tests but fails in production after deployment
Diagnosis
Local dev uses one embedding model config, production environment loads a different one from config file or environment variable
Fix
Ensure embedding model name is loaded from the same source and validated before use. Consider baking the model name into the vectorstore metadata at index time and asserting it matches at query time.
Production upgrade path
In production, implement a validation layer that stores the embedding model name in vector store metadata at index time and asserts it matches the query-time embedding model before executing retrieval. Example: <code>assert vectorstore.metadata['embedding_model'] == query_embedding_model, f"Mismatch: indexed with {vectorstore.metadata['embedding_model']}, querying with {query_embedding_model}"</code>. This converts a silent failure into an immediate, loud error.
Common gotcha
The embedding models have very similar names (text-embedding-3-small vs text-embedding-3-large). A developer might think they're using the same model when the name differs only in one word. Even worse: if an environment variable or config file specifies the model, and you load it inconsistently (once at index time, once at query time from different sources), you get a silent mismatch.
Experienced dev note
In production, treat embedding model choice as immutable infrastructure, not a runtime parameter. Once you index with a model, you've locked in that embedding space for the lifetime of that index. Changing the model means re-embedding all documents, which is expensive. Some teams version their indices by model name (e.g., index-v1-text-embedding-3-small) to avoid accidental mismatches. Also: embedding model pricing and latency matter. A larger model (3-large) may retrieve more relevant results but costs 4x more per query and runs slower: often not worth it for search quality gains. Test empirically with your data.
Check your understanding
You indexed 10,000 documents using OpenAIEmbeddings(model='text-embedding-3-small'). At query time, a developer accidentally uses OpenAIEmbeddings(model='text-embedding-small'). What happens and why is it hard to detect?
Show answer hint
The system will not raise an error. It will execute the query and return ranked results based on cosine similarity in an incompatible vector space. The results will appear to work (no exception) but will be semantically misaligned with the query. You'd only notice by manually inspecting whether results are relevant.