How to use OpenAI API with Flask in python
Direct answer
Use the
openai SDK v1+ with Flask by initializing the OpenAI client using your API key from os.environ, then call client.chat.completions.create inside a Flask route to handle requests.Setup
Install
pip install flask openai Env vars
OPENAI_API_KEY Imports
import os
from flask import Flask, request, jsonify
from openai import OpenAI Examples
inUser sends POST request to /chat with JSON {"message": "Hello, AI!"}
out{"response": "Hello! How can I assist you today?"}
inUser sends POST request to /chat with JSON {"message": "Write a haiku about spring."}
out{"response": "Spring breeze softly blows,\nCherry blossoms dance in light,\nNature's breath renewed."}
inUser sends POST request to /chat with JSON {"message": ""}
out{"error": "No message provided"}
Integration steps
- Install Flask and OpenAI SDK using pip
- Set your OpenAI API key in the environment variable OPENAI_API_KEY
- Import Flask and OpenAI client in your Python script
- Initialize the OpenAI client with the API key from os.environ
- Create a Flask app and define a route to accept user input
- Inside the route, call client.chat.completions.create with the user message
- Return the AI-generated response as JSON to the client
Full code
import os
from flask import Flask, request, jsonify
from openai import OpenAI
app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.route('/chat', methods=['POST'])
def chat():
data = request.get_json(force=True)
user_message = data.get('message', '').strip()
if not user_message:
return jsonify({"error": "No message provided"}), 400
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}]
)
ai_text = response.choices[0].message.content
return jsonify({"response": ai_text})
if __name__ == '__main__':
app.run(debug=True, port=5000) output
* Serving Flask app 'app' (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
# Example POST request to http://127.0.0.1:5000/chat with JSON {"message": "Hello, AI!"}
# Response:
{"response": "Hello! How can I assist you today?"} API trace
Request
{"model": "gpt-4o", "messages": [{"role": "user", "content": "<user message>"}]} Response
{"choices": [{"message": {"content": "<AI response>"}}], "usage": {"total_tokens": 42}} Extract
response.choices[0].message.contentVariants
Streaming response with Flask ›
Use streaming to send partial AI responses in real-time for better user experience on long outputs.
import os
from flask import Flask, request, Response
from openai import OpenAI
app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.route('/chat_stream', methods=['POST'])
def chat_stream():
data = request.get_json(force=True)
user_message = data.get('message', '').strip()
if not user_message:
return Response("No message provided", status=400)
def generate():
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}],
stream=True
)
for chunk in stream:
yield chunk.choices[0].delta.get('content', '')
return Response(generate(), mimetype='text/plain')
if __name__ == '__main__':
app.run(debug=True, port=5000) Async Flask with OpenAI ›
Use async Flask routes to handle multiple concurrent AI requests efficiently.
import os
import asyncio
from flask import Flask, request, jsonify
from openai import OpenAI
app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.route('/chat_async', methods=['POST'])
async def chat_async():
data = await request.get_json(force=True)
user_message = data.get('message', '').strip()
if not user_message:
return jsonify({"error": "No message provided"}), 400
response = await client.chat.completions.acreate(
model="gpt-4o",
messages=[{"role": "user", "content": user_message}]
)
ai_text = response.choices[0].message.content
return jsonify({"response": ai_text})
if __name__ == '__main__':
app.run(debug=True, port=5000) Use Gemini-1.5-pro model instead of GPT-4o ›
Use Gemini-1.5-pro for a balance of speed and cost with strong general AI capabilities.
import os
from flask import Flask, request, jsonify
from openai import OpenAI
app = Flask(__name__)
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
@app.route('/chat', methods=['POST'])
def chat():
data = request.get_json(force=True)
user_message = data.get('message', '').strip()
if not user_message:
return jsonify({"error": "No message provided"}), 400
response = client.chat.completions.create(
model="gemini-1.5-pro",
messages=[{"role": "user", "content": user_message}]
)
ai_text = response.choices[0].message.content
return jsonify({"response": ai_text})
if __name__ == '__main__':
app.run(debug=True, port=5000) Performance
Latency~800ms for gpt-4o non-streaming calls
Cost~$0.002 per 500 tokens exchanged with gpt-4o
Rate limitsTier 1: 500 requests per minute / 30,000 tokens per minute
- Keep user messages concise to reduce token usage
- Use shorter system prompts or none if possible
- Cache frequent responses to avoid repeated calls
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Standard sync call | ~800ms | ~$0.002 | Simple Flask apps with moderate traffic |
| Streaming response | Starts immediately, total varies | Similar | Apps needing real-time partial output |
| Async calls | ~800ms (concurrent) | ~$0.002 | High concurrency Flask apps |
Quick tip
Always load your OpenAI API key securely from environment variables and never hardcode it in your Flask app.
Common mistake
Beginners often forget to set the API key in the environment or use deprecated SDK methods like openai.ChatCompletion.create().