How to Intermediate · 4 min read

How to build an AI-powered FAQ system

Quick answer

Build an AI-powered FAQ system by embedding your FAQ documents using vector embeddings and querying them with a large language model like gpt-4o. Use a vector store to find relevant FAQ entries and then generate precise answers with client.chat.completions.create from the OpenAI API.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install faiss-cpu or chromadb (for vector store)

Setup

Install the required Python packages and set your OpenAI API key as an environment variable.

bash

pip install openai faiss-cpu

Step by step

This example shows how to embed FAQ entries, store them in a vector index, and query with gpt-4o to answer user questions.

python

import os
from openai import OpenAI
import faiss
import numpy as np

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample FAQ data
faq_data = [
    {"question": "What is your return policy?", "answer": "You can return items within 30 days."},
    {"question": "How do I track my order?", "answer": "Use the tracking link sent to your email."},
    {"question": "Do you ship internationally?", "answer": "Yes, we ship worldwide with additional fees."}
]

# Step 1: Embed FAQ questions
faq_texts = [item["question"] for item in faq_data]

response = client.embeddings.create(
    model="text-embedding-3-large",
    input=faq_texts
)
embeddings = np.array([e.embedding for e in response.data]).astype('float32')

# Step 2: Build FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

# Step 3: Query function

def query_faq(user_question):
    # Embed user question
    q_resp = client.embeddings.create(
        model="text-embedding-3-large",
        input=[user_question]
    )
    q_embedding = np.array(q_resp.data[0].embedding).astype('float32')

    # Search nearest FAQ
    D, I = index.search(np.array([q_embedding]), k=1)
    matched_faq = faq_data[I[0][0]]

    # Use LLM to generate answer based on matched FAQ
    prompt = f"User question: {user_question}\nFAQ question: {matched_faq['question']}\nFAQ answer: {matched_faq['answer']}\nProvide a concise answer to the user question based on the FAQ answer."

    chat_resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return chat_resp.choices[0].message.content

# Example usage
print(query_faq("Can I return a product after a month?"))

output

You can return items within 30 days, so unfortunately returns after a month are not accepted.

Common variations

Use gpt-4o-mini for faster, cheaper responses with slightly less accuracy.
Implement async calls with asyncio and await for scalable web apps.
Swap FAISS with chromadb or other vector stores for persistence and cloud integration.

Troubleshooting

If embeddings return errors, verify your API key and model name.
If FAISS index search returns no results, check that embeddings are correctly computed and indexed.
For unexpected LLM answers, refine the prompt or increase max_tokens in client.chat.completions.create.

Key Takeaways

Use vector embeddings to convert FAQ questions into searchable numeric form.
Leverage a vector store like FAISS to find the most relevant FAQ entry for a user query.
Use a large language model like gpt-4o to generate natural, context-aware answers.
Keep your prompts clear and concise to improve answer quality.
Test and tune embedding and retrieval parameters for best FAQ matching.

Verified 2026-04 · gpt-4o, text-embedding-3-large, gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.