How to Intermediate · 4 min read

How to use Semantic Kernel for RAG

Quick answer
Use semantic_kernel to combine vector-based document retrieval with AI chat completion for Retrieval Augmented Generation (RAG). Load documents into a vector store, retrieve relevant context, and pass it as system prompt or context to OpenAIChatCompletion for accurate, context-aware answers.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install semantic-kernel openai faiss-cpu

Setup

Install semantic-kernel and dependencies, set your OpenAI API key as an environment variable.

bash
pip install semantic-kernel openai faiss-cpu

Step by step

This example shows how to load documents, create embeddings, store them in a FAISS vector store, retrieve relevant context for a query, and generate a response using Semantic Kernel with OpenAI's gpt-4o-mini model.

python
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.memory import VectorMemory
from semantic_kernel.memory.faiss import FaissMemoryStore

# Initialize kernel and AI service
kernel = Kernel()
openai_api_key = os.environ["OPENAI_API_KEY"]
chat_service = OpenAIChatCompletion(
    service_id="chat",
    api_key=openai_api_key,
    ai_model_id="gpt-4o-mini"
)
kernel.add_service(chat_service)

# Create FAISS vector store memory
memory_store = FaissMemoryStore()
memory = VectorMemory(memory_store)
kernel.register_memory(memory)

# Sample documents to index
documents = [
    "Semantic Kernel is a framework for AI orchestration.",
    "Retrieval Augmented Generation combines vector search with LLMs.",
    "OpenAI's gpt-4o-mini is a powerful chat model for RAG tasks."
]

# Add documents to vector memory
for i, doc in enumerate(documents):
    memory.save_information(f"doc_{i}", doc)

# Query to retrieve relevant context
query = "What is Retrieval Augmented Generation?"

# Retrieve relevant documents
relevant_docs = memory.search_async(query, limit=2).result()
context = "\n".join([doc.metadata.text for doc in relevant_docs])

# Prepare prompt with retrieved context
messages = [
    {"role": "system", "content": f"Use the following context to answer the question:\n{context}"},
    {"role": "user", "content": query}
]

# Generate answer
response = kernel.chat.complete(messages=messages)
print("Answer:", response.text)
output
Answer: Retrieval Augmented Generation (RAG) is a technique that combines vector-based document retrieval with large language models to provide accurate, context-aware answers by leveraging relevant external knowledge.

Common variations

  • Use different AI models by changing ai_model_id (e.g., gpt-4o).
  • Use async calls with await for retrieval and completion in async environments.
  • Integrate other vector stores by implementing IMemoryStore interface.
python
import asyncio

async def rag_async():
    relevant_docs = await memory.search_async(query, limit=3)
    context = "\n".join([doc.metadata.text for doc in relevant_docs])
    messages = [
        {"role": "system", "content": f"Use the following context to answer:\n{context}"},
        {"role": "user", "content": query}
    ]
    response = await kernel.chat.complete_async(messages=messages)
    print("Async answer:", response.text)

asyncio.run(rag_async())
output
Async answer: Retrieval Augmented Generation (RAG) enhances language model responses by retrieving relevant documents from a vector store and using them as context for generation.

Troubleshooting

  • If retrieval returns no documents, ensure documents are indexed correctly and embedding model is consistent.
  • If API calls fail, verify OPENAI_API_KEY is set and has proper permissions.
  • For memory store errors, check FAISS installation and compatibility with your platform.

Key Takeaways

  • Semantic Kernel enables RAG by combining vector memory retrieval with AI chat completions.
  • Use FAISS or other vector stores to index and search documents efficiently.
  • Pass retrieved context as system prompt to improve answer relevance and accuracy.
Verified 2026-04 · gpt-4o-mini
Verify ↗