How to beginner · 3 min read

How to deploy AI workflow as API

Quick answer

Deploy an AI workflow as an API by wrapping your AI calls (e.g., OpenAI client) inside a web framework like FastAPI. Use environment variables for API keys, define endpoints that accept input, call the AI model, and return the response as JSON.

PREREQUISITES

Python 3.8+
OpenAI API key (free tier works)
pip install openai>=1.0 fastapi uvicorn

Setup

Install the required packages and set your environment variable for the OpenAI API key.

Install packages: pip install openai fastapi uvicorn
Set environment variable: export OPENAI_API_KEY='your_api_key' (Linux/macOS) or setx OPENAI_API_KEY "your_api_key" (Windows)

bash

pip install openai fastapi uvicorn

output

Collecting openai
Collecting fastapi
Collecting uvicorn
Successfully installed openai fastapi uvicorn

Step by step

Create a simple FastAPI app that exposes an endpoint to receive user input, calls the OpenAI gpt-4o-mini model, and returns the AI-generated response.

python

import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

class RequestBody(BaseModel):
    prompt: str

@app.post("/generate")
async def generate_text(request: RequestBody):
    try:
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": request.prompt}]
        )
        text = response.choices[0].message.content
        return {"response": text}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# To run: uvicorn filename:app --reload

output

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

# Example request:
# curl -X POST http://127.0.0.1:8000/generate -H "Content-Type: application/json" -d '{"prompt": "Hello AI"}'
# Response:
# {"response": "Hello! How can I assist you today?"}

Common variations

You can extend this basic API by:

Using async calls if supported by your AI SDK.
Adding streaming responses for real-time token generation.
Switching models, e.g., gpt-4o-mini for faster, cheaper inference.
Adding authentication and rate limiting for production readiness.

python

from fastapi import FastAPI
from openai import OpenAI
import os

app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

@app.post("/stream")
async def stream_generate(prompt: str):
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    async def event_generator():
        async for chunk in stream:
            delta = chunk.choices[0].delta.content or ""
            yield delta
    return event_generator()

output

INFO:     Started server process [12346]
INFO:     Uvicorn running on http://127.0.0.1:8000

# Client can consume streaming tokens as they arrive for low-latency UI updates.

Troubleshooting

If you get KeyError: 'OPENAI_API_KEY', ensure your environment variable is set and your shell is restarted.
For ConnectionError, check your internet and API endpoint access.
If the model name is invalid, verify you are using a current model like gpt-4o-mini.
Use uvicorn filename:app --reload to auto-reload on code changes during development.

✅

Key Takeaways

Use FastAPI to wrap AI calls into RESTful endpoints for easy deployment.
Always load API keys securely from environment variables, never hardcode.
Leverage streaming and async features for responsive AI-powered APIs.
Test locally with uvicorn before deploying to production.
Keep model names updated to use the latest available AI models.

Verified 2026-04 · gpt-4o-mini

Verify ↗