How to build financial chatbot with RAG
Quick answer
Build a financial chatbot with
RAG by combining a vector database for retrieving relevant financial documents with a large language model (LLM) like gpt-4o to generate context-aware answers. Use embeddings to index financial data, query the vector store for relevant info, then feed that context to the LLM for precise, up-to-date responses.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install faiss-cpupip install numpy
Setup environment
Install required Python packages and set your OpenAI API key as an environment variable.
- Use
pip install openai faiss-cpu numpyto install dependencies. - Set
OPENAI_API_KEYin your shell or environment.
pip install openai faiss-cpu numpy output
Collecting openai Collecting faiss-cpu Collecting numpy Successfully installed openai faiss-cpu numpy
Step by step implementation
This example shows how to build a simple financial chatbot using RAG with OpenAI's gpt-4o model and FAISS for vector search.
1. Embed financial documents using OpenAI embeddings.
2. Store embeddings in FAISS index.
3. Query index with user question.
4. Retrieve relevant context.
5. Pass context + question to LLM for answer generation.
import os
import numpy as np
import faiss
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Sample financial documents
documents = [
"The Federal Reserve raised interest rates by 0.25% in March 2026.",
"Inflation rates have stabilized around 2% in the last quarter.",
"Stock market volatility increased due to geopolitical tensions.",
"Cryptocurrency regulations are tightening globally.",
"The unemployment rate dropped to 3.5% in April 2026."
]
# Step 1: Create embeddings for documents
embeddings = []
for doc in documents:
response = client.embeddings.create(model="text-embedding-3-small", input=doc)
embeddings.append(response.data[0].embedding)
embeddings = np.array(embeddings).astype('float32')
# Step 2: Build FAISS index
dimension = len(embeddings[0])
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)
# Step 3: Function to query chatbot
def financial_chatbot(question: str) -> str:
# Embed the question
q_embedding_resp = client.embeddings.create(model="text-embedding-3-small", input=question)
q_embedding = np.array(q_embedding_resp.data[0].embedding).astype('float32').reshape(1, -1)
# Search FAISS for top 2 relevant docs
D, I = index.search(q_embedding, k=2)
context = "\n".join([documents[i] for i in I[0]])
# Prepare prompt with context
prompt = f"You are a financial assistant. Use the following context to answer the question.\nContext:\n{context}\nQuestion: {question}\nAnswer:"
# Step 4: Generate answer with LLM
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=256
)
return response.choices[0].message.content.strip()
# Example usage
if __name__ == "__main__":
question = "What is the current interest rate trend?"
answer = financial_chatbot(question)
print(f"Q: {question}\nA: {answer}") output
Q: What is the current interest rate trend? A: The Federal Reserve raised interest rates by 0.25% in March 2026, indicating a trend of increasing interest rates.
Common variations
- Use async calls with
asyncioand OpenAI's async client for better throughput. - Switch vector store to
ChromaorPineconefor scalable cloud storage. - Use different LLMs like
claude-3-5-sonnet-20241022orgemini-2.5-profor varied style and cost. - Implement streaming responses for real-time chatbot interaction.
Troubleshooting tips
- If embeddings are slow or fail, check your API key and network connectivity.
- If FAISS index search returns irrelevant results, increase
kor improve document quality. - For incomplete LLM answers, increase
max_tokensor refine prompt context. - Ensure environment variables are correctly set to avoid authentication errors.
Key Takeaways
- Use vector embeddings and FAISS to retrieve relevant financial context efficiently.
- Feed retrieved context with user query to an LLM like
gpt-4ofor accurate answers. - Adapt vector stores and LLM models based on scale, cost, and latency requirements.