How to Intermediate · 3 min read

How to build visual search with AI

Quick answer

Build visual search by extracting image embeddings with a multimodal model like gpt-4o or gemini-2.5-pro, then index these vectors in a vector database such as FAISS. Query images by embedding the query image and retrieving nearest neighbors via vector similarity search.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 faiss-cpu numpy pillow

Setup

Install required Python packages and set your OpenAI API key as an environment variable.

bash

pip install openai faiss-cpu numpy pillow

Step by step

This example extracts embeddings from images using gpt-4o multimodal model, indexes them with FAISS, and performs a visual search by embedding a query image.

python

import os
import numpy as np
from PIL import Image
from openai import OpenAI
import faiss

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Function to load and preprocess image

def load_image(path):
    img = Image.open(path).convert("RGB")
    img = img.resize((224, 224))  # Resize for model input
    return img

# Function to get image embedding from OpenAI multimodal model

def get_image_embedding(image_path):
    with open(image_path, "rb") as f:
        image_bytes = f.read()
    response = client.embeddings.create(
        model="gpt-4o",
        input=image_bytes,
        encoding_format="png"
    )
    return np.array(response.data[0].embedding, dtype=np.float32)

# Sample image dataset
image_paths = ["images/cat.png", "images/dog.png", "images/bird.png"]

# Extract embeddings
embeddings = []
for path in image_paths:
    emb = get_image_embedding(path)
    embeddings.append(emb)
embeddings = np.vstack(embeddings)

# Build FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Query image
query_path = "images/query_cat.png"
query_emb = get_image_embedding(query_path)

# Search top 2 nearest images
D, I = index.search(np.expand_dims(query_emb, axis=0), k=2)

print("Top matches:")
for idx, dist in zip(I[0], D[0]):
    print(f"Image: {image_paths[idx]}, Distance: {dist:.4f}")

output

Top matches:
Image: images/cat.png, Distance: 0.0023
Image: images/dog.png, Distance: 0.0157

Common variations

Use gemini-2.5-pro or other multimodal models for potentially better embeddings.
Use async calls with asyncio for batch embedding extraction.
Replace FAISS with cloud vector DBs like Pinecone or Weaviate for scalability.
Preprocess images differently depending on model requirements (e.g., size, format).

Troubleshooting

If embeddings extraction fails, verify your API key and model name.
Ensure images are in supported formats (PNG, JPEG) and properly loaded.
If FAISS index search returns no results, check embedding dimensions match.
For large datasets, consider approximate nearest neighbor indexes like faiss.IndexIVFFlat.

✅

Key Takeaways

Use multimodal models like gpt-4o to extract image embeddings for visual search.
Index embeddings with vector databases such as FAISS for efficient similarity search.
Query by embedding the input image and retrieving nearest neighbors by vector distance.
Consider async calls and scalable vector DBs for production-grade visual search systems.

Verified 2026-04 · gpt-4o, gemini-2.5-pro

Verify ↗