How to build visual search with AI
Quick answer
Build visual search by extracting image embeddings with a multimodal model like
gpt-4o or gemini-2.5-pro, then index these vectors in a vector database such as FAISS. Query images by embedding the query image and retrieving nearest neighbors via vector similarity search.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 faiss-cpu numpy pillow
Setup
Install required Python packages and set your OpenAI API key as an environment variable.
pip install openai faiss-cpu numpy pillow Step by step
This example extracts embeddings from images using gpt-4o multimodal model, indexes them with FAISS, and performs a visual search by embedding a query image.
import os
import numpy as np
from PIL import Image
from openai import OpenAI
import faiss
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Function to load and preprocess image
def load_image(path):
img = Image.open(path).convert("RGB")
img = img.resize((224, 224)) # Resize for model input
return img
# Function to get image embedding from OpenAI multimodal model
def get_image_embedding(image_path):
with open(image_path, "rb") as f:
image_bytes = f.read()
response = client.embeddings.create(
model="gpt-4o",
input=image_bytes,
encoding_format="png"
)
return np.array(response.data[0].embedding, dtype=np.float32)
# Sample image dataset
image_paths = ["images/cat.png", "images/dog.png", "images/bird.png"]
# Extract embeddings
embeddings = []
for path in image_paths:
emb = get_image_embedding(path)
embeddings.append(emb)
embeddings = np.vstack(embeddings)
# Build FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
# Query image
query_path = "images/query_cat.png"
query_emb = get_image_embedding(query_path)
# Search top 2 nearest images
D, I = index.search(np.expand_dims(query_emb, axis=0), k=2)
print("Top matches:")
for idx, dist in zip(I[0], D[0]):
print(f"Image: {image_paths[idx]}, Distance: {dist:.4f}") output
Top matches: Image: images/cat.png, Distance: 0.0023 Image: images/dog.png, Distance: 0.0157
Common variations
- Use
gemini-2.5-proor other multimodal models for potentially better embeddings. - Use async calls with
asynciofor batch embedding extraction. - Replace
FAISSwith cloud vector DBs like Pinecone or Weaviate for scalability. - Preprocess images differently depending on model requirements (e.g., size, format).
Troubleshooting
- If embeddings extraction fails, verify your API key and model name.
- Ensure images are in supported formats (PNG, JPEG) and properly loaded.
- If FAISS index search returns no results, check embedding dimensions match.
- For large datasets, consider approximate nearest neighbor indexes like
faiss.IndexIVFFlat.
Key Takeaways
- Use multimodal models like
gpt-4oto extract image embeddings for visual search. - Index embeddings with vector databases such as
FAISSfor efficient similarity search. - Query by embedding the input image and retrieving nearest neighbors by vector distance.
- Consider async calls and scalable vector DBs for production-grade visual search systems.