How to build customer support chatbot with RAG
Quick answer
Build a customer support chatbot with
RAG by combining a vector database for document retrieval and a large language model like gpt-4o for response generation. Use OpenAI SDK to embed support documents, query relevant context, and generate answers based on retrieved knowledge.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 faiss-cpu numpy
Setup
Install required Python packages and set your OPENAI_API_KEY environment variable.
pip install openai faiss-cpu numpy Step by step
This example shows how to embed support documents, build a FAISS vector store, retrieve relevant documents for a user query, and generate a chatbot response using gpt-4o.
import os
import numpy as np
import faiss
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample support documents
documents = [
"How to reset your password",
"Troubleshooting login issues",
"Refund policy and process",
"How to update billing information",
"Contact support and working hours"
]
# Step 1: Embed documents
embeddings = []
for doc in documents:
response = client.embeddings.create(model="text-embedding-3-small", input=doc)
embeddings.append(response.data[0].embedding)
embeddings = np.array(embeddings).astype("float32")
# Step 2: Build FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
# Step 3: Define a function to retrieve relevant docs
def retrieve(query, k=2):
query_embedding = client.embeddings.create(model="text-embedding-3-small", input=query).data[0].embedding
query_vector = np.array([query_embedding]).astype("float32")
distances, indices = index.search(query_vector, k)
return [documents[i] for i in indices[0]]
# Step 4: Generate chatbot response with context
user_question = "How can I get a refund if I am not satisfied?"
relevant_docs = retrieve(user_question)
context = "\n---\n".join(relevant_docs)
messages = [
{"role": "system", "content": "You are a helpful customer support assistant."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_question}"}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
print("Chatbot answer:")
print(response.choices[0].message.content) output
Chatbot answer: Our refund policy allows you to request a refund if you are not satisfied with our service. Please contact support during working hours for assistance with the refund process.
Common variations
- Use async calls with
asyncioandawaitfor scalable chatbots. - Switch to other vector stores like
ChromaorPineconefor cloud-based retrieval. - Try different models like
gpt-4o-minifor cost-effective responses. - Implement streaming responses for real-time user experience.
Troubleshooting
- If embeddings are slow, batch requests or cache embeddings locally.
- Ensure your
OPENAI_API_KEYis set correctly to avoid authentication errors. - If retrieval returns irrelevant documents, increase
kor improve document quality. - Check for API rate limits and handle exceptions gracefully.
Key Takeaways
- Use vector embeddings and FAISS to retrieve relevant support documents efficiently.
- Combine retrieved context with
gpt-4o-minichat completions for accurate, context-aware answers. - Optimize retrieval parameters and model choice based on latency and cost requirements.