How to build AI-powered recommendation system
embedding models to convert items and user data into vectors, vector databases for similarity search, and LLMs for personalized ranking or explanation. Use retrieval-augmented generation (RAG) to enhance recommendations with context-aware responses.RECOMMENDATION
text-embedding-3-small for embeddings due to its balance of cost, quality, and speed, combined with a vector store like FAISS and gpt-4o for personalized ranking and explanations.| Use case | Best choice | Why | Runner-up |
|---|---|---|---|
| E-commerce product recommendations | text-embedding-3-small + FAISS + gpt-4o | Efficient embeddings for product vectors, fast similarity search, and LLM for personalized ranking | gemini-1.5-pro + Pinecone + gemini-1.5-flash |
| Content-based news recommendations | text-embedding-3-small + Chroma + gpt-4o | Good semantic embeddings for news articles and flexible vector DB with LLM for summaries | claude-3-5-sonnet-20241022 + FAISS |
| Movie or media recommendations | text-embedding-3-small + FAISS + gpt-4o | High-quality embeddings capture user preferences and content features, LLM for explanations | mistral-large-latest + Chroma |
| Personalized learning content | text-embedding-3-small + FAISS + gpt-4o | Embeddings encode learner profiles and content, LLM tailors recommendations and feedback | claude-3-5-sonnet-20241022 + Pinecone |
Top picks explained
Use text-embedding-3-small for generating vector representations of items and user profiles because it offers a strong balance of accuracy, speed, and cost efficiency at $0.02 per 1M tokens with 1536 dimensions. Combine this with FAISS, an open-source vector database, for fast similarity search at scale. For ranking and generating personalized explanations, gpt-4o is the best choice due to its strong contextual understanding and cost-effective pricing.
Alternatives include gemini-1.5-pro for embeddings and gemini-1.5-flash for LLM tasks, which excel in multimodal and general use cases but may be costlier. claude-3-5-sonnet-20241022 leads in coding and complex reasoning but is less commonly used for embeddings.
In practice
Here is a Python example using the OpenAI SDK v1+ to build a simple recommendation query pipeline: embed user preferences and items, search with FAISS, then rerank with gpt-4o.
import os
import numpy as np
from openai import OpenAI
import faiss
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample items
items = [
{"id": "item1", "text": "Wireless noise-cancelling headphones"},
{"id": "item2", "text": "Bluetooth portable speaker"},
{"id": "item3", "text": "Smart fitness watch"}
]
# Embed items
item_texts = [item["text"] for item in items]
response = client.embeddings.create(
model="text-embedding-3-small",
input=item_texts
)
item_vectors = np.array([e.embedding for e in response.data]).astype('float32')
# Build FAISS index
dimension = item_vectors.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(item_vectors)
# Embed user query
user_query = "Looking for wireless audio devices"
query_response = client.embeddings.create(
model="text-embedding-3-small",
input=[user_query]
)
query_vector = np.array(query_response.data[0].embedding).astype('float32').reshape(1, -1)
# Search top 2 similar items
D, I = index.search(query_vector, 2)
# Prepare prompt for GPT-4o to rerank and explain
candidates = [items[i]["text"] for i in I[0]]
prompt = f"User query: {user_query}\nCandidates:\n"
for i, c in enumerate(candidates):
prompt += f"{i+1}. {c}\n"
prompt += "\nRank these candidates by relevance and explain your choice."
# Call GPT-4o
chat_response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(chat_response.choices[0].message.content) 1. Wireless noise-cancelling headphones - most relevant because it matches 'wireless audio devices' closely. 2. Bluetooth portable speaker - relevant but less specific to 'noise-cancelling'.
Pricing and limits
| Option | Free | Cost | Limits | Context |
|---|---|---|---|---|
text-embedding-3-small | Yes, limited tokens | $0.02 / 1M tokens | Max 8192 tokens per request | Embeddings for items and queries |
| FAISS | Fully free, open source | Free | Depends on hardware | Vector similarity search |
gpt-4o | Yes, limited tokens | $0.03 / 1K tokens | Max 8192 tokens context | Ranking and explanation generation |
gemini-1.5-pro | Check pricing | Variable | Varies by usage | Alternative embeddings and LLM |
What to avoid
- Avoid using older embedding models like
text-embedding-3-largeor deprecated APIs as they are less cost-effective and slower. - Do not rely solely on LLMs without embeddings and vector search; this is inefficient and less scalable for large catalogs.
- Avoid vector databases without efficient indexing (e.g., naive linear search) for production-scale recommendations.
- Beware of using models with very small context windows for ranking, as they cannot consider enough candidate items.
How to evaluate for your case
Benchmark your recommendation system by measuring precision@k, recall@k, and user engagement metrics on a validation dataset. Use offline evaluation with held-out user-item interactions and online A/B testing for real user feedback. Automate evaluation by scripting embedding similarity tests and LLM reranking quality checks.
def precision_at_k(recommended, relevant, k):
recommended_k = recommended[:k]
return len(set(recommended_k) & set(relevant)) / k
# Example usage
recommended_items = ['item1', 'item3', 'item2']
relevant_items = ['item1', 'item2']
print(f"Precision@2: {precision_at_k(recommended_items, relevant_items, 2):.2f}") Precision@2: 0.50
Key Takeaways
- Use
text-embedding-3-smallfor cost-effective, high-quality embeddings in recommendation systems. - Combine embeddings with FAISS for scalable, fast similarity search over large item catalogs.
- Leverage
gpt-4oto rerank and generate personalized explanations for recommendations. - Avoid outdated embedding models and inefficient vector search methods to maintain performance and cost control.
- Evaluate recommendations offline with precision/recall metrics and online with user engagement A/B tests.