How to beginner · 3 min read

How to deploy AI workflow as API

Quick answer
Deploy an AI workflow as an API by wrapping your AI calls (e.g., OpenAI client) inside a web framework like FastAPI. Use environment variables for API keys, define endpoints that accept input, call the AI model, and return the response as JSON.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • pip install openai>=1.0 fastapi uvicorn

Setup

Install the required packages and set your environment variable for the OpenAI API key.

  • Install packages: pip install openai fastapi uvicorn
  • Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)
bash
pip install openai fastapi uvicorn
output
Collecting openai
Collecting fastapi
Collecting uvicorn
Successfully installed openai fastapi uvicorn

Step by step

Create a simple FastAPI app that exposes an endpoint to receive user input, calls the OpenAI gpt-4o-mini model, and returns the AI-generated response.

python
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class RequestBody(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: RequestBody):
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": request.prompt}]
        )
        text = response.choices[0].message.content
        return {"response": text}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# To run: uvicorn filename:app --reload
output
INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

# Example request:
# curl -X POST http://127.0.0.1:8000/generate -H "Content-Type: application/json" -d '{"prompt": "Hello AI"}'
# Response:
# {"response": "Hello! How can I assist you today?"}

Common variations

You can extend this basic API by:

  • Using async calls if supported by your AI SDK.
  • Adding streaming responses for real-time token generation.
  • Switching models, e.g., gpt-4o-mini for faster, cheaper inference.
  • Adding authentication and rate limiting for production readiness.
python
from fastapi import FastAPI
from openai import OpenAI
import os

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.post("/stream")
async def stream_generate(prompt: str):
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    async def event_generator():
        async for chunk in stream:
            delta = chunk.choices[0].delta.content or ""
            yield delta
    return event_generator()
output
INFO:     Started server process [12346]
INFO:     Uvicorn running on http://127.0.0.1:8000

# Client can consume streaming tokens as they arrive for low-latency UI updates.

Troubleshooting

  • If you get KeyError: 'OPENAI_API_KEY', ensure your environment variable is set and your shell is restarted.
  • For ConnectionError, check your internet and API endpoint access.
  • If the model name is invalid, verify you are using a current model like gpt-4o-mini.
  • Use uvicorn filename:app --reload to auto-reload on code changes during development.

Key Takeaways

  • Use FastAPI to wrap AI calls into RESTful endpoints for easy deployment.
  • Always load API keys securely from environment variables, never hardcode.
  • Leverage streaming and async features for responsive AI-powered APIs.
  • Test locally with uvicorn before deploying to production.
  • Keep model names updated to use the latest available AI models.
Verified 2026-04 · gpt-4o-mini
Verify ↗