How to use retrieval to add context to prompts
Quick answer
Use retrieval to fetch relevant documents or data from a knowledge base, then prepend or inject this context into your prompt before sending it to the model. This technique, called retrieval-augmented generation, improves accuracy and relevance by grounding the AI's response in up-to-date or domain-specific information.
PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0A vector store or document database for retrieval (e.g., FAISS, ChromaDB)
Setup
Install the OpenAI Python SDK and a vector store library like faiss-cpu or chromadb to enable document retrieval. Set your OpenAI API key as an environment variable.
pip install openai faiss-cpu Step by step
This example shows how to retrieve relevant documents from a vector store and add them as context to a prompt sent to gpt-4o. The retrieved text is prepended to the user query to guide the model's response.
import os
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Simulated retrieval function returning relevant context
# In practice, replace with vector search from FAISS or ChromaDB
def retrieve_context(query):
knowledge_base = {
"Python": "Python is a high-level programming language known for its readability.",
"API": "An API allows different software systems to communicate with each other.",
}
# Simple keyword match retrieval
for key, text in knowledge_base.items():
if key.lower() in query.lower():
return text
return ""
user_query = "Explain how Python handles APIs."
context = retrieve_context(user_query)
# Construct prompt with retrieved context
prompt = f"Context: {context}\n\nQuestion: {user_query}\nAnswer:"
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content) output
Python provides several libraries such as requests and Flask to handle APIs effectively. It allows you to create, consume, and manage APIs with ease, leveraging its readable syntax and extensive ecosystem.
Common variations
- Use asynchronous calls with
asynciofor faster retrieval and prompt generation. - Switch models like
claude-3-5-sonnet-20241022for improved coding or reasoning tasks. - Integrate streaming responses to display partial answers as they generate.
- Use external vector databases like Pinecone or Weaviate for scalable retrieval.
import asyncio
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
async def async_retrieval_prompt(query):
# Simulate async retrieval
await asyncio.sleep(0.1)
context = "Python is a versatile language used for APIs."
prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
print(response.choices[0].message.content)
asyncio.run(async_retrieval_prompt("How does Python support API development?")) output
Python supports API development through frameworks like Flask and FastAPI, enabling quick and scalable web service creation.
Troubleshooting
- If retrieval returns no relevant context, the prompt may lack grounding—improve your retrieval method or fallback to a generic prompt.
- Ensure your API key is set correctly in
os.environ["OPENAI_API_KEY"]to avoid authentication errors. - For large context, watch token limits; truncate or summarize retrieved documents before adding.
Key Takeaways
- Retrieval-augmented prompting improves AI accuracy by grounding responses in relevant external data.
- Prepend retrieved context to user queries to provide the model with necessary background.
- Use vector search tools like FAISS or ChromaDB to efficiently find relevant documents.
- Manage token limits by summarizing or truncating retrieved context before prompt injection.
- Async and streaming variants enhance performance and user experience in real-time applications.