How to Intermediate · 4 min read

How to build RAG with Azure OpenAI

Q: How to build RAG with Azure OpenAI

Build Retrieval-Augmented Generation (RAG) with AzureOpenAI by combining a vector store for document retrieval and AzureOpenAI chat completions for generation. Use AzureOpenAI SDK to query your deployed model with retrieved context to produce accurate, context-aware answers.

Quick answer

Build Retrieval-Augmented Generation (RAG) with AzureOpenAI by combining a vector store for document retrieval and AzureOpenAI chat completions for generation. Use AzureOpenAI SDK to query your deployed model with retrieved context to produce accurate, context-aware answers.

PREREQUISITES

Python 3.8+
Azure OpenAI resource with deployment name
Azure OpenAI API key and endpoint
pip install openai>=1.0
pip install faiss-cpu (or another vector store)

Setup

Install the required Python packages and set environment variables for your Azure OpenAI API key and endpoint. You will also need a vector store like FAISS for document retrieval.

bash

pip install openai faiss-cpu

Step by step

This example shows how to build a simple RAG system by embedding documents, storing them in FAISS, retrieving relevant documents for a query, and then using Azure OpenAI to generate an answer based on the retrieved context.

python

import os
from openai import AzureOpenAI
import faiss
import numpy as np

# Initialize Azure OpenAI client
client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version="2024-02-01"
)

# Example documents to index
documents = [
    "Python is a popular programming language.",
    "Azure OpenAI provides powerful AI models.",
    "Retrieval-Augmented Generation combines search and generation.",
    "FAISS is a library for efficient similarity search."
]

# Step 1: Embed documents using Azure OpenAI embeddings
embedding_model = "text-embedding-3-large"

def embed_text(texts):
    response = client.embeddings.create(model=embedding_model, input=texts)
    return np.array([item.embedding for item in response.data], dtype=np.float32)

embeddings = embed_text(documents)

# Step 2: Build FAISS index
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

# Step 3: Query embedding
query = "What is RAG?"
query_embedding = embed_text([query])

# Step 4: Retrieve top 2 relevant documents
k = 2
_, indices = index.search(query_embedding, k)
retrieved_docs = [documents[i] for i in indices[0]]

# Step 5: Build prompt with retrieved context
context = "\n".join(retrieved_docs)
prompt = f"Use the following context to answer the question:\n{context}\nQuestion: {query}\nAnswer:"

# Step 6: Generate answer with Azure OpenAI chat completion
response = client.chat.completions.create(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    messages=[{"role": "user", "content": prompt}]
)
answer = response.choices[0].message.content

print("Question:", query)
print("Answer:", answer)

output

Question: What is RAG?
Answer: Retrieval-Augmented Generation (RAG) is a technique that combines document retrieval with language model generation to provide accurate and context-aware answers.

Common variations

Use async calls with asyncio and await for scalable applications.
Switch embedding models or chat models by changing embedding_model or model parameters.
Integrate other vector stores like Chroma or Pinecone instead of FAISS.

Troubleshooting

If you get authentication errors, verify your AZURE_OPENAI_API_KEY and AZURE_OPENAI_ENDPOINT environment variables.
If embeddings fail, ensure your deployment supports the embedding model text-embedding-3-large.
For slow retrieval, check FAISS index dimension and data size.

✅

Key Takeaways

Use AzureOpenAI SDK with environment variables for secure API access.
Combine vector search (FAISS) with Azure OpenAI chat completions for effective RAG.
Embed documents and queries with the same embedding model for accurate retrieval.
Construct prompts by injecting retrieved context before the user query.
Test and adjust retrieval count and model parameters for best results.

Verified 2026-04 · gpt-4o, text-embedding-3-large

Verify ↗