How to deploy FastAPI LLM app on AWS
Quick answer
Deploy a
FastAPI app integrated with an LLM like gpt-4o-mini on AWS by launching an EC2 instance, installing dependencies, setting environment variables for your OpenAI API key, and running the app with uvicorn. Use AWS security groups to expose the app port and optionally configure a domain or load balancer for production.PREREQUISITES
Python 3.8+AWS account with EC2 accessOpenAI API key (free tier works)pip install fastapi uvicorn openai>=1.0
Setup AWS EC2 instance
Launch an AWS EC2 instance (Amazon Linux 2 or Ubuntu 22.04 recommended). Configure security groups to allow inbound traffic on port 8000 (or your chosen port). SSH into the instance and install Python 3.8+, pip, and Git if needed.
sudo yum update -y
sudo yum install python3 git -y
python3 -m pip install --upgrade pip output
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd ... (yum update output) Installed: python3.x86_64 ... Requirement already satisfied: pip in /usr/lib/python3.8/site-packages (21.0.1)
Step by step FastAPI LLM app
Create a simple FastAPI app that calls the OpenAI API gpt-4o-mini model. Set your OPENAI_API_KEY as an environment variable on the EC2 instance. Run the app with uvicorn to serve HTTP requests.
import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
class Prompt(BaseModel):
text: str
@app.post("/generate")
async def generate_text(prompt: Prompt):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt.text}]
)
return {"response": response.choices[0].message.content}
# To run: uvicorn main:app --host 0.0.0.0 --port 8000 Common variations
- Use
uvicorn main:app --host 0.0.0.0 --port 80to serve on default HTTP port. - Switch to async OpenAI calls if supported for better concurrency.
- Deploy on AWS Elastic Beanstalk or ECS for managed scaling.
- Use other LLM providers like
Anthropic Claudeby changing the client and model.
Troubleshooting
- If you get
Connection refused, check AWS security group inbound rules for your port. - Ensure
OPENAI_API_KEYis set in the environment before running the app. - Use
sudo lsof -i :8000to check if port is in use. - Check logs with
journalctl -u uvicornor console output for errors.
Key Takeaways
- Use AWS EC2 with proper security group settings to expose your FastAPI LLM app.
- Set environment variables securely for your OpenAI API key on the server.
- Run FastAPI with uvicorn binding to 0.0.0.0 to accept external requests.
- Consider managed AWS services like Elastic Beanstalk or ECS for production scaling.
- Check network and environment setup first when debugging connectivity or API errors.