How to deploy AI workflow as API
Quick answer
Deploy an AI workflow as an API by wrapping your AI calls (e.g.,
OpenAI client) inside a web framework like FastAPI. Use environment variables for API keys, define endpoints that accept input, call the AI model, and return the response as JSON.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0 fastapi uvicorn
Setup
Install the required packages and set your environment variable for the OpenAI API key.
- Install packages:
pip install openai fastapi uvicorn - Set environment variable:
export OPENAI_API_KEY='your_api_key'(Linux/macOS) orsetx OPENAI_API_KEY "your_api_key"(Windows)
pip install openai fastapi uvicorn output
Collecting openai Collecting fastapi Collecting uvicorn Successfully installed openai fastapi uvicorn
Step by step
Create a simple FastAPI app that exposes an endpoint to receive user input, calls the OpenAI gpt-4o-mini model, and returns the AI-generated response.
import os
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
class RequestBody(BaseModel):
prompt: str
@app.post("/generate")
async def generate_text(request: RequestBody):
try:
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": request.prompt}]
)
text = response.choices[0].message.content
return {"response": text}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
# To run: uvicorn filename:app --reload output
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
# Example request:
# curl -X POST http://127.0.0.1:8000/generate -H "Content-Type: application/json" -d '{"prompt": "Hello AI"}'
# Response:
# {"response": "Hello! How can I assist you today?"} Common variations
You can extend this basic API by:
- Using async calls if supported by your AI SDK.
- Adding streaming responses for real-time token generation.
- Switching models, e.g.,
gpt-4o-minifor faster, cheaper inference. - Adding authentication and rate limiting for production readiness.
from fastapi import FastAPI
from openai import OpenAI
import os
app = FastAPI()
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.post("/stream")
async def stream_generate(prompt: str):
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
stream=True
)
async def event_generator():
async for chunk in stream:
delta = chunk.choices[0].delta.content or ""
yield delta
return event_generator() output
INFO: Started server process [12346] INFO: Uvicorn running on http://127.0.0.1:8000 # Client can consume streaming tokens as they arrive for low-latency UI updates.
Troubleshooting
- If you get
KeyError: 'OPENAI_API_KEY', ensure your environment variable is set and your shell is restarted. - For
ConnectionError, check your internet and API endpoint access. - If the model name is invalid, verify you are using a current model like
gpt-4o-mini. - Use
uvicorn filename:app --reloadto auto-reload on code changes during development.
Key Takeaways
- Use
FastAPIto wrap AI calls into RESTful endpoints for easy deployment. - Always load API keys securely from environment variables, never hardcode.
- Leverage streaming and async features for responsive AI-powered APIs.
- Test locally with
uvicornbefore deploying to production. - Keep model names updated to use the latest available AI models.