How to Intermediate · 4 min read

How to use retrieval to add context to prompts

Quick answer

Use retrieval to fetch relevant documents or data from a knowledge base, then prepend or inject this context into your prompt before sending it to the model. This technique, called retrieval-augmented generation, improves accuracy and relevance by grounding the AI's response in up-to-date or domain-specific information.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
A vector store or document database for retrieval (e.g., FAISS, ChromaDB)

Setup

Install the OpenAI Python SDK and a vector store library like faiss-cpu or chromadb to enable document retrieval. Set your OpenAI API key as an environment variable.

bash

pip install openai faiss-cpu

Step by step

This example shows how to retrieve relevant documents from a vector store and add them as context to a prompt sent to gpt-4o. The retrieved text is prepended to the user query to guide the model's response.

python

import os
from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Simulated retrieval function returning relevant context
# In practice, replace with vector search from FAISS or ChromaDB

def retrieve_context(query):
    knowledge_base = {
        "Python": "Python is a high-level programming language known for its readability.",
        "API": "An API allows different software systems to communicate with each other.",
    }
    # Simple keyword match retrieval
    for key, text in knowledge_base.items():
        if key.lower() in query.lower():
            return text
    return ""

user_query = "Explain how Python handles APIs."
context = retrieve_context(user_query)

# Construct prompt with retrieved context
prompt = f"Context: {context}\n\nQuestion: {user_query}\nAnswer:" 

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)

print(response.choices[0].message.content)

output

Python provides several libraries such as requests and Flask to handle APIs effectively. It allows you to create, consume, and manage APIs with ease, leveraging its readable syntax and extensive ecosystem.

Common variations

Use asynchronous calls with asyncio for faster retrieval and prompt generation.
Switch models like claude-3-5-sonnet-20241022 for improved coding or reasoning tasks.
Integrate streaming responses to display partial answers as they generate.
Use external vector databases like Pinecone or Weaviate for scalable retrieval.

python

import asyncio
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

async def async_retrieval_prompt(query):
    # Simulate async retrieval
    await asyncio.sleep(0.1)
    context = "Python is a versatile language used for APIs."
    prompt = f"Context: {context}\n\nQuestion: {query}\nAnswer:"
    response = await client.chat.completions.acreate(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    print(response.choices[0].message.content)

asyncio.run(async_retrieval_prompt("How does Python support API development?"))

output

Python supports API development through frameworks like Flask and FastAPI, enabling quick and scalable web service creation.

Troubleshooting

If retrieval returns no relevant context, the prompt may lack grounding—improve your retrieval method or fallback to a generic prompt.
Ensure your API key is set correctly in os.environ["OPENAI_API_KEY"] to avoid authentication errors.
For large context, watch token limits; truncate or summarize retrieved documents before adding.

✅

Key Takeaways

Retrieval-augmented prompting improves AI accuracy by grounding responses in relevant external data.
Prepend retrieved context to user queries to provide the model with necessary background.
Use vector search tools like FAISS or ChromaDB to efficiently find relevant documents.
Manage token limits by summarizing or truncating retrieved context before prompt injection.
Async and streaming variants enhance performance and user experience in real-time applications.

Verified 2026-04 · gpt-4o, claude-3-5-sonnet-20241022

Verify ↗