Best For Intermediate · 4 min read

How to build AI-powered recommendation system

Quick answer
Build AI-powered recommendation systems by combining embedding models to convert items and user data into vectors, vector databases for similarity search, and LLMs for personalized ranking or explanation. Use retrieval-augmented generation (RAG) to enhance recommendations with context-aware responses.

RECOMMENDATION

For AI-powered recommendation systems, use text-embedding-3-small for embeddings due to its balance of cost, quality, and speed, combined with a vector store like FAISS and gpt-4o for personalized ranking and explanations.
Use caseBest choiceWhyRunner-up
E-commerce product recommendationstext-embedding-3-small + FAISS + gpt-4oEfficient embeddings for product vectors, fast similarity search, and LLM for personalized rankinggemini-1.5-pro + Pinecone + gemini-1.5-flash
Content-based news recommendationstext-embedding-3-small + Chroma + gpt-4oGood semantic embeddings for news articles and flexible vector DB with LLM for summariesclaude-3-5-sonnet-20241022 + FAISS
Movie or media recommendationstext-embedding-3-small + FAISS + gpt-4oHigh-quality embeddings capture user preferences and content features, LLM for explanationsmistral-large-latest + Chroma
Personalized learning contenttext-embedding-3-small + FAISS + gpt-4oEmbeddings encode learner profiles and content, LLM tailors recommendations and feedbackclaude-3-5-sonnet-20241022 + Pinecone

Top picks explained

Use text-embedding-3-small for generating vector representations of items and user profiles because it offers a strong balance of accuracy, speed, and cost efficiency at $0.02 per 1M tokens with 1536 dimensions. Combine this with FAISS, an open-source vector database, for fast similarity search at scale. For ranking and generating personalized explanations, gpt-4o is the best choice due to its strong contextual understanding and cost-effective pricing.

Alternatives include gemini-1.5-pro for embeddings and gemini-1.5-flash for LLM tasks, which excel in multimodal and general use cases but may be costlier. claude-3-5-sonnet-20241022 leads in coding and complex reasoning but is less commonly used for embeddings.

In practice

Here is a Python example using the OpenAI SDK v1+ to build a simple recommendation query pipeline: embed user preferences and items, search with FAISS, then rerank with gpt-4o.

python
import os
import numpy as np
from openai import OpenAI
import faiss

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample items
items = [
    {"id": "item1", "text": "Wireless noise-cancelling headphones"},
    {"id": "item2", "text": "Bluetooth portable speaker"},
    {"id": "item3", "text": "Smart fitness watch"}
]

# Embed items
item_texts = [item["text"] for item in items]
response = client.embeddings.create(
    model="text-embedding-3-small",
    input=item_texts
)
item_vectors = np.array([e.embedding for e in response.data]).astype('float32')

# Build FAISS index
dimension = item_vectors.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(item_vectors)

# Embed user query
user_query = "Looking for wireless audio devices"
query_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=[user_query]
)
query_vector = np.array(query_response.data[0].embedding).astype('float32').reshape(1, -1)

# Search top 2 similar items
D, I = index.search(query_vector, 2)

# Prepare prompt for GPT-4o to rerank and explain
candidates = [items[i]["text"] for i in I[0]]
prompt = f"User query: {user_query}\nCandidates:\n"
for i, c in enumerate(candidates):
    prompt += f"{i+1}. {c}\n"
prompt += "\nRank these candidates by relevance and explain your choice."

# Call GPT-4o
chat_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print(chat_response.choices[0].message.content)
output
1. Wireless noise-cancelling headphones - most relevant because it matches 'wireless audio devices' closely.
2. Bluetooth portable speaker - relevant but less specific to 'noise-cancelling'.

Pricing and limits

OptionFreeCostLimitsContext
text-embedding-3-smallYes, limited tokens$0.02 / 1M tokensMax 8192 tokens per requestEmbeddings for items and queries
FAISSFully free, open sourceFreeDepends on hardwareVector similarity search
gpt-4oYes, limited tokens$0.03 / 1K tokensMax 8192 tokens contextRanking and explanation generation
gemini-1.5-proCheck pricingVariableVaries by usageAlternative embeddings and LLM

What to avoid

  • Avoid using older embedding models like text-embedding-3-large or deprecated APIs as they are less cost-effective and slower.
  • Do not rely solely on LLMs without embeddings and vector search; this is inefficient and less scalable for large catalogs.
  • Avoid vector databases without efficient indexing (e.g., naive linear search) for production-scale recommendations.
  • Beware of using models with very small context windows for ranking, as they cannot consider enough candidate items.

How to evaluate for your case

Benchmark your recommendation system by measuring precision@k, recall@k, and user engagement metrics on a validation dataset. Use offline evaluation with held-out user-item interactions and online A/B testing for real user feedback. Automate evaluation by scripting embedding similarity tests and LLM reranking quality checks.

python
def precision_at_k(recommended, relevant, k):
    recommended_k = recommended[:k]
    return len(set(recommended_k) & set(relevant)) / k

# Example usage
recommended_items = ['item1', 'item3', 'item2']
relevant_items = ['item1', 'item2']
print(f"Precision@2: {precision_at_k(recommended_items, relevant_items, 2):.2f}")
output
Precision@2: 0.50

Key Takeaways

  • Use text-embedding-3-small for cost-effective, high-quality embeddings in recommendation systems.
  • Combine embeddings with FAISS for scalable, fast similarity search over large item catalogs.
  • Leverage gpt-4o to rerank and generate personalized explanations for recommendations.
  • Avoid outdated embedding models and inefficient vector search methods to maintain performance and cost control.
  • Evaluate recommendations offline with precision/recall metrics and online with user engagement A/B tests.
Verified 2026-04 · text-embedding-3-small, gpt-4o, gemini-1.5-pro, gemini-1.5-flash, claude-3-5-sonnet-20241022, mistral-large-latest
Verify ↗