How to Intermediate · 3 min read

How to deploy FastAPI LLM app on AWS

Q: How to deploy FastAPI LLM app on AWS

Deploy a FastAPI app integrated with an LLM like gpt-4o-mini on AWS by launching an EC2 instance, installing dependencies, setting environment variables for your OpenAI API key, and running the app with uvicorn. Use AWS security groups to expose the app port and optionally configure a domain or load balancer for production.

Quick answer

Deploy a FastAPI app integrated with an LLM like gpt-4o-mini on AWS by launching an EC2 instance, installing dependencies, setting environment variables for your OpenAI API key, and running the app with uvicorn. Use AWS security groups to expose the app port and optionally configure a domain or load balancer for production.

PREREQUISITES

Python 3.8+
AWS account with EC2 access
OpenAI API key (free tier works)
pip install fastapi uvicorn openai>=1.0

Setup AWS EC2 instance

Launch an AWS EC2 instance (Amazon Linux 2 or Ubuntu 22.04 recommended). Configure security groups to allow inbound traffic on port 8000 (or your chosen port). SSH into the instance and install Python 3.8+, pip, and Git if needed.

bash

sudo yum update -y
sudo yum install python3 git -y
python3 -m pip install --upgrade pip

output

Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
... (yum update output)
Installed:
  python3.x86_64  ...
Requirement already satisfied: pip in /usr/lib/python3.8/site-packages (21.0.1)

Step by step FastAPI LLM app

Create a simple FastAPI app that calls the OpenAI API gpt-4o-mini model. Set your OPENAI_API_KEY as an environment variable on the EC2 instance. Run the app with uvicorn to serve HTTP requests.

python

import os
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class Prompt(BaseModel):
    text: str

@app.post("/generate")
async def generate_text(prompt: Prompt):
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt.text}]
    )
    return {"response": response.choices[0].message.content}

# To run: uvicorn main:app --host 0.0.0.0 --port 8000

Common variations

Use uvicorn main:app --host 0.0.0.0 --port 80 to serve on default HTTP port.
Switch to async OpenAI calls if supported for better concurrency.
Deploy on AWS Elastic Beanstalk or ECS for managed scaling.
Use other LLM providers like Anthropic Claude by changing the client and model.

Troubleshooting

If you get Connection refused, check AWS security group inbound rules for your port.
Ensure OPENAI_API_KEY is set in the environment before running the app.
Use sudo lsof -i :8000 to check if port is in use.
Check logs with journalctl -u uvicorn or console output for errors.

Key Takeaways

Use AWS EC2 with proper security group settings to expose your FastAPI LLM app.
Set environment variables securely for your OpenAI API key on the server.
Run FastAPI with uvicorn binding to 0.0.0.0 to accept external requests.
Consider managed AWS services like Elastic Beanstalk or ECS for production scaling.
Check network and environment setup first when debugging connectivity or API errors.

Verified 2026-04 · gpt-4o-mini

Verify ↗

Community Notes

No notes yetBe the first to share a version-specific fix or tip.