Search returning irrelevant results fix
Quick answer
To fix search returning irrelevant results, use
embedding-based semantic search with a high-quality vector store and ensure your query and documents are properly preprocessed. Use OpenAI embeddings like text-embedding-3-small and tune retrieval parameters such as top_k to improve relevance.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install faiss-cpu
Setup
Install required packages and set your OpenAI API key as an environment variable.
pip install openai faiss-cpu output
Collecting openai Collecting faiss-cpu Successfully installed openai-1.x.x faiss-cpu-1.x.x
Step by step
This example shows how to embed documents and queries using OpenAI embeddings, store them in FAISS, and perform a semantic search that returns relevant results.
import os
from openai import OpenAI
import faiss
import numpy as np
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample documents
documents = [
"Python is a popular programming language.",
"OpenAI provides powerful AI models.",
"Semantic search improves search relevance.",
"Cats are common household pets.",
"Machine learning enables AI applications."
]
# Embed documents
response = client.embeddings.create(
model="text-embedding-3-small",
input=documents
)
embeddings = np.array([data.embedding for data in response.data]).astype('float32')
# Create FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
# Query to search
query = "How to use AI for better search?"
# Embed query
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=[query]
)
query_embedding = np.array(query_response.data[0].embedding).astype('float32')
# Search top 3 relevant documents
k = 3
D, I = index.search(np.array([query_embedding]), k)
print("Top relevant documents:")
for idx in I[0]:
print(f"- {documents[idx]}") output
Top relevant documents: - Semantic search improves search relevance. - OpenAI provides powerful AI models. - Machine learning enables AI applications.
Common variations
- Use async calls with
asyncioandawaitfor embedding requests. - Try different embedding models like
text-embedding-3-largefor better quality. - Adjust
top_kin FAISS search to balance recall and precision. - Use other vector stores like
ChromaorPineconefor scalable search.
Troubleshooting
- If search returns irrelevant results, verify your documents and queries are clean and well-formatted.
- Ensure embeddings are generated with the same model for both documents and queries.
- Increase
top_kto retrieve more candidates and rerank if needed. - Check your API key and network connectivity if embedding calls fail.
Key Takeaways
- Use semantic embeddings and vector search to improve search relevance.
- Keep document and query embeddings consistent with the same model.
- Tune retrieval parameters like top_k to balance relevance and recall.