How to add conversation history to FastAPI LLM endpoint
Quick answer
To add conversation history in a
FastAPI LLM endpoint, maintain a list of message dictionaries representing the chat history and pass it to the messages parameter of the client.chat.completions.create method. This preserves context across user interactions for coherent AI responses.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0pip install fastapi uvicorn
Setup
Install the required packages and set your OpenAI API key as an environment variable.
- Install FastAPI and Uvicorn for the web server.
- Install the OpenAI SDK v1+ for API calls.
- Set
OPENAI_API_KEYin your environment.
pip install fastapi uvicorn openai>=1.0 Step by step
This example shows a complete FastAPI app that stores conversation history in memory per session and sends it to the OpenAI gpt-4o-mini model for context-aware chat completions.
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# In-memory store for conversation history keyed by session ID
conversation_histories = {}
class ChatRequest(BaseModel):
session_id: str
user_message: str
@app.post("/chat")
async def chat_endpoint(chat_request: ChatRequest):
session_id = chat_request.session_id
user_message = chat_request.user_message
# Initialize history if new session
if session_id not in conversation_histories:
conversation_histories[session_id] = []
# Append user message to history
conversation_histories[session_id].append({"role": "user", "content": user_message})
# Call OpenAI chat completion with full history
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=conversation_histories[session_id]
)
assistant_message = response.choices[0].message.content
# Append assistant response to history
conversation_histories[session_id].append({"role": "assistant", "content": assistant_message})
return {"response": assistant_message}
# To run: uvicorn filename:app --reload Common variations
- Use async OpenAI calls if supported by your SDK version.
- Store conversation history in a database or Redis for persistence across server restarts.
- Switch models by changing the
modelparameter (e.g.,gpt-4.1orclaude-3-5-sonnet-20241022). - Implement token limits by trimming older messages to stay within model context window.
Troubleshooting
- If you get context length errors, trim or summarize older conversation history.
- Ensure
OPENAI_API_KEYis set correctly to avoid authentication errors. - Use logging to debug message history sent to the API.
- For high concurrency, avoid in-memory storage; use external stores like Redis.
Key Takeaways
- Maintain conversation history as a list of message dicts with roles 'user' and 'assistant'.
- Pass the full conversation history to the
messagesparameter for context-aware responses. - Store conversation history per session to handle multiple users concurrently.
- Trim or manage history length to avoid exceeding model context limits.
- Use environment variables for API keys and avoid hardcoding credentials.