How to Intermediate · 4 min read

How to build customer support chatbot with RAG

Q: How to build customer support chatbot with RAG

Build a customer support chatbot with RAG by combining a vector database for document retrieval and a large language model like gpt-4o for response generation. Use OpenAI SDK to embed support documents, query relevant context, and generate answers based on retrieved knowledge.

Quick answer

Build a customer support chatbot with RAG by combining a vector database for document retrieval and a large language model like gpt-4o for response generation. Use OpenAI SDK to embed support documents, query relevant context, and generate answers based on retrieved knowledge.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 faiss-cpu numpy

Setup

Install required Python packages and set your OPENAI_API_KEY environment variable.

bash

pip install openai faiss-cpu numpy

Step by step

This example shows how to embed support documents, build a FAISS vector store, retrieve relevant documents for a user query, and generate a chatbot response using gpt-4o.

python

import os
import numpy as np
import faiss
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample support documents
documents = [
    "How to reset your password",
    "Troubleshooting login issues",
    "Refund policy and process",
    "How to update billing information",
    "Contact support and working hours"
]

# Step 1: Embed documents
embeddings = []
for doc in documents:
    response = client.embeddings.create(model="text-embedding-3-small", input=doc)
    embeddings.append(response.data[0].embedding)

embeddings = np.array(embeddings).astype("float32")

# Step 2: Build FAISS index
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)

# Step 3: Define a function to retrieve relevant docs
def retrieve(query, k=2):
    query_embedding = client.embeddings.create(model="text-embedding-3-small", input=query).data[0].embedding
    query_vector = np.array([query_embedding]).astype("float32")
    distances, indices = index.search(query_vector, k)
    return [documents[i] for i in indices[0]]

# Step 4: Generate chatbot response with context
user_question = "How can I get a refund if I am not satisfied?"
relevant_docs = retrieve(user_question)

context = "\n---\n".join(relevant_docs)

messages = [
    {"role": "system", "content": "You are a helpful customer support assistant."},
    {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_question}"}
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

print("Chatbot answer:")
print(response.choices[0].message.content)

output

Chatbot answer:
Our refund policy allows you to request a refund if you are not satisfied with our service. Please contact support during working hours for assistance with the refund process.

Common variations

Use async calls with asyncio and await for scalable chatbots.
Switch to other vector stores like Chroma or Pinecone for cloud-based retrieval.
Try different models like gpt-4o-mini for cost-effective responses.
Implement streaming responses for real-time user experience.

Troubleshooting

If embeddings are slow, batch requests or cache embeddings locally.
Ensure your OPENAI_API_KEY is set correctly to avoid authentication errors.
If retrieval returns irrelevant documents, increase k or improve document quality.
Check for API rate limits and handle exceptions gracefully.

Key Takeaways

Use vector embeddings and FAISS to retrieve relevant support documents efficiently.
Combine retrieved context with gpt-4o-mini chat completions for accurate, context-aware answers.
Optimize retrieval parameters and model choice based on latency and cost requirements.

Verified 2026-04 · gpt-4o-mini, text-embedding-3-small

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.