How to Intermediate · 4 min read

How to cite sources from documents in RAG

Quick answer
Use a Retrieval-Augmented Generation (RAG) pipeline by embedding your documents with OpenAIEmbeddings and storing them in a vector store like FAISS. Query the vector store to retrieve relevant document snippets and include their metadata as citations in your chat.completions.create prompts to generate answers with source attributions.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai langchain langchain_community faiss-cpu

Setup

Install required packages and set your environment variable for the OpenAI API key.

  • Install packages: pip install openai langchain langchain_community faiss-cpu
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai langchain langchain_community faiss-cpu

Step by step

This example shows how to embed documents, create a FAISS vector store, query it, and generate a response with cited sources using OpenAI and LangChain.

python
import os
from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Sample documents with metadata
documents = [
    {"text": "Python is a popular programming language.", "metadata": {"source": "doc1.txt"}},
    {"text": "RAG combines retrieval with generation.", "metadata": {"source": "doc2.txt"}},
    {"text": "FAISS is a vector search library.", "metadata": {"source": "doc3.txt"}}
]

# Embed documents
embedding_client = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])
vectors = [embedding_client.embed_text(doc["text"]) for doc in documents]

# Create FAISS index
index = FAISS.from_embeddings(vectors, documents)

# Query vector store
query = "What is RAG?"
query_vector = embedding_client.embed_text(query)
results = index.query(query_vector, top_k=2)

# Prepare context with citations
context = "\n".join(
    f'{res["text"]} (Source: {res["metadata"]["source"]})' for res in results
)

# Create prompt with context
prompt_template = """
You are an AI assistant. Use the following context to answer the question.

Context:
{context}

Question:
{question}

Answer with citations.
"""

prompt = ChatPromptTemplate.from_template(prompt_template).format(
    context=context,
    question=query
)

# Generate completion
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}]
)

print("Answer:", response.choices[0].message.content)
output
Answer: Retrieval-Augmented Generation (RAG) combines retrieval of relevant documents with AI generation to produce accurate responses. FAISS is a vector search library used to index and search document embeddings. (Source: doc2.txt)

Python is a popular programming language. (Source: doc1.txt)

Common variations

You can adapt this approach by:

  • Using async calls with the OpenAI SDK for concurrency.
  • Switching to other vector stores like Chroma or Weaviate.
  • Using different models such as gpt-4o-mini for cost efficiency.
  • Including source URLs or page numbers in metadata for richer citations.
python
import asyncio
from openai import OpenAI

async def async_query():
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Explain RAG with sources."}],
        stream=False
    )
    print(response.choices[0].message.content)

asyncio.run(async_query())
output
Retrieval-Augmented Generation (RAG) is a technique that combines document retrieval with language model generation to provide accurate answers with source references.

Troubleshooting

  • If you get empty search results, verify your embeddings and vector store indexing.
  • If citations are missing, ensure metadata is correctly attached to documents and included in the prompt.
  • For API errors, check your OPENAI_API_KEY environment variable and model availability.

Key Takeaways

  • Embed documents and store vectors with metadata to enable source retrieval in RAG.
  • Include retrieved document snippets and their sources in prompts to generate cited answers.
  • Use vector stores like FAISS with OpenAI embeddings for efficient document search.
  • Adapt the approach with async calls, different models, or vector stores as needed.
  • Always verify metadata integrity to ensure accurate source citations in responses.
Verified 2026-04 · gpt-4o-mini
Verify ↗