How to Intermediate · 4 min read

How to use Semantic Kernel for RAG

Q: How to use Semantic Kernel for RAG

Use semantic_kernel to combine vector-based document retrieval with AI chat completion for Retrieval Augmented Generation (RAG). Load documents into a vector store, retrieve relevant context, and pass it as system prompt or context to OpenAIChatCompletion for accurate, context-aware answers.

Quick answer

Use semantic_kernel to combine vector-based document retrieval with AI chat completion for Retrieval Augmented Generation (RAG). Load documents into a vector store, retrieve relevant context, and pass it as system prompt or context to OpenAIChatCompletion for accurate, context-aware answers.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install semantic-kernel openai faiss-cpu

Setup

Install semantic-kernel and dependencies, set your OpenAI API key as an environment variable.

bash

pip install semantic-kernel openai faiss-cpu

Step by step

This example shows how to load documents, create embeddings, store them in a FAISS vector store, retrieve relevant context for a query, and generate a response using Semantic Kernel with OpenAI's gpt-4o-mini model.

python

import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.memory import VectorMemory
from semantic_kernel.memory.faiss import FaissMemoryStore

# Initialize kernel and AI service
kernel = Kernel()
openai_api_key = os.environ["OPENAI_API_KEY"]
chat_service = OpenAIChatCompletion(
    service_id="chat",
    api_key=openai_api_key,
    ai_model_id="gpt-4o-mini"
)
kernel.add_service(chat_service)

# Create FAISS vector store memory
memory_store = FaissMemoryStore()
memory = VectorMemory(memory_store)
kernel.register_memory(memory)

# Sample documents to index
documents = [
    "Semantic Kernel is a framework for AI orchestration.",
    "Retrieval Augmented Generation combines vector search with LLMs.",
    "OpenAI's gpt-4o-mini is a powerful chat model for RAG tasks."
]

# Add documents to vector memory
for i, doc in enumerate(documents):
    memory.save_information(f"doc_{i}", doc)

# Query to retrieve relevant context
query = "What is Retrieval Augmented Generation?"

# Retrieve relevant documents
relevant_docs = memory.search_async(query, limit=2).result()
context = "\n".join([doc.metadata.text for doc in relevant_docs])

# Prepare prompt with retrieved context
messages = [
    {"role": "system", "content": f"Use the following context to answer the question:\n{context}"},
    {"role": "user", "content": query}
]

# Generate answer
response = kernel.chat.complete(messages=messages)
print("Answer:", response.text)

output

Answer: Retrieval Augmented Generation (RAG) is a technique that combines vector-based document retrieval with large language models to provide accurate, context-aware answers by leveraging relevant external knowledge.

Common variations

Use different AI models by changing ai_model_id (e.g., gpt-4o).
Use async calls with await for retrieval and completion in async environments.
Integrate other vector stores by implementing IMemoryStore interface.

python

import asyncio

async def rag_async():
    relevant_docs = await memory.search_async(query, limit=3)
    context = "\n".join([doc.metadata.text for doc in relevant_docs])
    messages = [
        {"role": "system", "content": f"Use the following context to answer:\n{context}"},
        {"role": "user", "content": query}
    ]
    response = await kernel.chat.complete_async(messages=messages)
    print("Async answer:", response.text)

asyncio.run(rag_async())

output

Async answer: Retrieval Augmented Generation (RAG) enhances language model responses by retrieving relevant documents from a vector store and using them as context for generation.

Troubleshooting

If retrieval returns no documents, ensure documents are indexed correctly and embedding model is consistent.
If API calls fail, verify OPENAI_API_KEY is set and has proper permissions.
For memory store errors, check FAISS installation and compatibility with your platform.

Key Takeaways

Semantic Kernel enables RAG by combining vector memory retrieval with AI chat completions.
Use FAISS or other vector stores to index and search documents efficiently.
Pass retrieved context as system prompt to improve answer relevance and accuracy.

Verified 2026-04 · gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.