How to intermediate · 4 min read

How to add conversation history to FastAPI LLM endpoint

Q: How to add conversation history to FastAPI LLM endpoint

To add conversation history in a FastAPI LLM endpoint, maintain a list of message dictionaries representing the chat history and pass it to the messages parameter of the client.chat.completions.create method. This preserves context across user interactions for coherent AI responses.

Quick answer

To add conversation history in a FastAPI LLM endpoint, maintain a list of message dictionaries representing the chat history and pass it to the messages parameter of the client.chat.completions.create method. This preserves context across user interactions for coherent AI responses.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0
pip install fastapi uvicorn

Setup

Install the required packages and set your OpenAI API key as an environment variable.

Install FastAPI and Uvicorn for the web server.
Install the OpenAI SDK v1+ for API calls.
Set OPENAI_API_KEY in your environment.

bash

pip install fastapi uvicorn openai>=1.0

Step by step

This example shows a complete FastAPI app that stores conversation history in memory per session and sends it to the OpenAI gpt-4o-mini model for context-aware chat completions.

python

import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# In-memory store for conversation history keyed by session ID
conversation_histories = {}

class ChatRequest(BaseModel):
    session_id: str
    user_message: str

@app.post("/chat")
async def chat_endpoint(chat_request: ChatRequest):
    session_id = chat_request.session_id
    user_message = chat_request.user_message

    # Initialize history if new session
    if session_id not in conversation_histories:
        conversation_histories[session_id] = []

    # Append user message to history
    conversation_histories[session_id].append({"role": "user", "content": user_message})

    # Call OpenAI chat completion with full history
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation_histories[session_id]
    )

    assistant_message = response.choices[0].message.content

    # Append assistant response to history
    conversation_histories[session_id].append({"role": "assistant", "content": assistant_message})

    return {"response": assistant_message}

# To run: uvicorn filename:app --reload

Common variations

Use async OpenAI calls if supported by your SDK version.
Store conversation history in a database or Redis for persistence across server restarts.
Switch models by changing the model parameter (e.g., gpt-4.1 or claude-3-5-sonnet-20241022).
Implement token limits by trimming older messages to stay within model context window.

Troubleshooting

If you get context length errors, trim or summarize older conversation history.
Ensure OPENAI_API_KEY is set correctly to avoid authentication errors.
Use logging to debug message history sent to the API.
For high concurrency, avoid in-memory storage; use external stores like Redis.

✅

Key Takeaways

Maintain conversation history as a list of message dicts with roles 'user' and 'assistant'.
Pass the full conversation history to the messages parameter for context-aware responses.
Store conversation history per session to handle multiple users concurrently.
Trim or manage history length to avoid exceeding model context limits.
Use environment variables for API keys and avoid hardcoding credentials.

Verified 2026-04 · gpt-4o-mini, gpt-4.1, claude-3-5-sonnet-20241022

Verify ↗