Best For Intermediate · 4 min read

How to build AI-powered recommendation system

Q: How to build AI-powered recommendation system

Build AI-powered recommendation systems by combining embedding models to convert items and user data into vectors, vector databases for similarity search, and LLMs for personalized ranking or explanation. Use retrieval-augmented generation (RAG) to enhance recommendations with context-aware responses.

Quick answer

Build AI-powered recommendation systems by combining embedding models to convert items and user data into vectors, vector databases for similarity search, and LLMs for personalized ranking or explanation. Use retrieval-augmented generation (RAG) to enhance recommendations with context-aware responses.

RECOMMENDATION

For AI-powered recommendation systems, use text-embedding-3-small for embeddings due to its balance of cost, quality, and speed, combined with a vector store like FAISS and gpt-4o for personalized ranking and explanations.

Use case	Best choice	Why	Runner-up
E-commerce product recommendations	`text-embedding-3-small` + FAISS + `gpt-4o`	Efficient embeddings for product vectors, fast similarity search, and LLM for personalized ranking	`gemini-1.5-pro` + Pinecone + `gemini-1.5-flash`
Content-based news recommendations	`text-embedding-3-small` + Chroma + `gpt-4o`	Good semantic embeddings for news articles and flexible vector DB with LLM for summaries	`claude-3-5-sonnet-20241022` + FAISS
Movie or media recommendations	`text-embedding-3-small` + FAISS + `gpt-4o`	High-quality embeddings capture user preferences and content features, LLM for explanations	`mistral-large-latest` + Chroma
Personalized learning content	`text-embedding-3-small` + FAISS + `gpt-4o`	Embeddings encode learner profiles and content, LLM tailors recommendations and feedback	`claude-3-5-sonnet-20241022` + Pinecone

Top picks explained

Use text-embedding-3-small for generating vector representations of items and user profiles because it offers a strong balance of accuracy, speed, and cost efficiency at $0.02 per 1M tokens with 1536 dimensions. Combine this with FAISS, an open-source vector database, for fast similarity search at scale. For ranking and generating personalized explanations, gpt-4o is the best choice due to its strong contextual understanding and cost-effective pricing.

Alternatives include gemini-1.5-pro for embeddings and gemini-1.5-flash for LLM tasks, which excel in multimodal and general use cases but may be costlier. claude-3-5-sonnet-20241022 leads in coding and complex reasoning but is less commonly used for embeddings.

In practice

Here is a Python example using the OpenAI SDK v1+ to build a simple recommendation query pipeline: embed user preferences and items, search with FAISS, then rerank with gpt-4o.

python

import os
import numpy as np
from openai import OpenAI
import faiss

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample items
items = [
    {"id": "item1", "text": "Wireless noise-cancelling headphones"},
    {"id": "item2", "text": "Bluetooth portable speaker"},
    {"id": "item3", "text": "Smart fitness watch"}
]

# Embed items
item_texts = [item["text"] for item in items]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=item_texts
)
item_vectors = np.array([e.embedding for e in response.data]).astype('float32')

# Build FAISS index
dimension = item_vectors.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(item_vectors)

# Embed user query
user_query = "Looking for wireless audio devices"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[user_query]
)
query_vector = np.array(query_response.data[0].embedding).astype('float32').reshape(1, -1)

# Search top 2 similar items
D, I = index.search(query_vector, 2)

# Prepare prompt for GPT-4o to rerank and explain
candidates = [items[i]["text"] for i in I[0]]
prompt = f"User query: {user_query}\nCandidates:\n"
for i, c in enumerate(candidates):
    prompt += f"{i+1}. {c}\n"
prompt += "\nRank these candidates by relevance and explain your choice."

# Call GPT-4o
chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print(chat_response.choices[0].message.content)

output

1. Wireless noise-cancelling headphones - most relevant because it matches 'wireless audio devices' closely.
2. Bluetooth portable speaker - relevant but less specific to 'noise-cancelling'.

Pricing and limits

Option	Free	Cost	Limits	Context
`text-embedding-3-small`	Yes, limited tokens	$0.02 / 1M tokens	Max 8192 tokens per request	Embeddings for items and queries
FAISS	Fully free, open source	Free	Depends on hardware	Vector similarity search
`gpt-4o`	Yes, limited tokens	$0.03 / 1K tokens	Max 8192 tokens context	Ranking and explanation generation
`gemini-1.5-pro`	Check pricing	Variable	Varies by usage	Alternative embeddings and LLM

What to avoid

Avoid using older embedding models like text-embedding-3-large or deprecated APIs as they are less cost-effective and slower.
Do not rely solely on LLMs without embeddings and vector search; this is inefficient and less scalable for large catalogs.
Avoid vector databases without efficient indexing (e.g., naive linear search) for production-scale recommendations.
Beware of using models with very small context windows for ranking, as they cannot consider enough candidate items.

How to evaluate for your case

Benchmark your recommendation system by measuring precision@k, recall@k, and user engagement metrics on a validation dataset. Use offline evaluation with held-out user-item interactions and online A/B testing for real user feedback. Automate evaluation by scripting embedding similarity tests and LLM reranking quality checks.

python

def precision_at_k(recommended, relevant, k):
    recommended_k = recommended[:k]
    return len(set(recommended_k) & set(relevant)) / k

# Example usage
recommended_items = ['item1', 'item3', 'item2']
relevant_items = ['item1', 'item2']
print(f"Precision@2: {precision_at_k(recommended_items, relevant_items, 2):.2f}")

output

Precision@2: 0.50

✅

Key Takeaways

Use text-embedding-3-small for cost-effective, high-quality embeddings in recommendation systems.
Combine embeddings with FAISS for scalable, fast similarity search over large item catalogs.
Leverage gpt-4o to rerank and generate personalized explanations for recommendations.
Avoid outdated embedding models and inefficient vector search methods to maintain performance and cost control.
Evaluate recommendations offline with precision/recall metrics and online with user engagement A/B tests.

Verified 2026-04 · text-embedding-3-small, gpt-4o, gemini-1.5-pro, gemini-1.5-flash, claude-3-5-sonnet-20241022, mistral-large-latest

Verify ↗