How to Intermediate · 4 min read

How to cite sources from documents in RAG

Quick answer

Use a Retrieval-Augmented Generation (RAG) pipeline by embedding your documents with OpenAIEmbeddings and storing them in a vector store like FAISS. Query the vector store to retrieve relevant document snippets and include their metadata as citations in your chat.completions.create prompts to generate answers with source attributions.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai langchain langchain_community faiss-cpu

Setup

Install required packages and set your environment variable for the OpenAI API key.

Install packages: pip install openai langchain langchain_community faiss-cpu
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai langchain langchain_community faiss-cpu

Step by step

This example shows how to embed documents, create a FAISS vector store, query it, and generate a response with cited sources using OpenAI and LangChain.

python

import os
from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample documents with metadata
documents = [
    {"text": "Python is a popular programming language.", "metadata": {"source": "doc1.txt"}},
    {"text": "RAG combines retrieval with generation.", "metadata": {"source": "doc2.txt"}},
    {"text": "FAISS is a vector search library.", "metadata": {"source": "doc3.txt"}}
]

# Embed documents
embedding_client = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])
vectors = [embedding_client.embed_text(doc["text"]) for doc in documents]

# Create FAISS index
index = FAISS.from_embeddings(vectors, documents)

# Query vector store
query = "What is RAG?"
query_vector = embedding_client.embed_text(query)
results = index.query(query_vector, top_k=2)

# Prepare context with citations
context = "\n".join(
    f'{res["text"]} (Source: {res["metadata"]["source"]})' for res in results
)

# Create prompt with context
prompt_template = """
You are an AI assistant. Use the following context to answer the question.

Context:
{context}

Question:
{question}

Answer with citations.
"""

prompt = ChatPromptTemplate.from_template(prompt_template).format(
    context=context,
    question=query
)

# Generate completion
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

print("Answer:", response.choices[0].message.content)

output

Answer: Retrieval-Augmented Generation (RAG) combines retrieval of relevant documents with AI generation to produce accurate responses. FAISS is a vector search library used to index and search document embeddings. (Source: doc2.txt)

Python is a popular programming language. (Source: doc1.txt)

Common variations

You can adapt this approach by:

Using async calls with the OpenAI SDK for concurrency.
Switching to other vector stores like Chroma or Weaviate.
Using different models such as gpt-4o-mini for cost efficiency.
Including source URLs or page numbers in metadata for richer citations.

python

import asyncio
from openai import OpenAI

async def async_query():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Explain RAG with sources."}],
        stream=False
    )
    print(response.choices[0].message.content)

asyncio.run(async_query())

output

Retrieval-Augmented Generation (RAG) is a technique that combines document retrieval with language model generation to provide accurate answers with source references.

Troubleshooting

If you get empty search results, verify your embeddings and vector store indexing.
If citations are missing, ensure metadata is correctly attached to documents and included in the prompt.
For API errors, check your OPENAI_API_KEY environment variable and model availability.

✅

Key Takeaways

Embed documents and store vectors with metadata to enable source retrieval in RAG.
Include retrieved document snippets and their sources in prompts to generate cited answers.
Use vector stores like FAISS with OpenAI embeddings for efficient document search.
Adapt the approach with async calls, different models, or vector stores as needed.
Always verify metadata integrity to ensure accurate source citations in responses.

Verified 2026-04 · gpt-4o-mini

Verify ↗