How to store OpenAI embeddings in a database
Quick answer
Use the
OpenAI SDK to generate embeddings with models like text-embedding-3-large. Convert the embedding vector to a storable format (e.g., JSON or binary) and save it in a database column designed for vectors or arrays, such as FLOAT[] in PostgreSQL or BLOB in SQLite.PREREQUISITES
Python 3.8+OpenAI API key (free tier works)pip install openai>=1.0Basic knowledge of SQL and a database (e.g., SQLite, PostgreSQL)
Setup
Install the openai Python package and set your API key as an environment variable for secure access.
pip install openai>=1.0 Step by step
This example shows how to generate an embedding using OpenAI's text-embedding-3-large model and store it in a SQLite database as a JSON string.
import os
import json
import sqlite3
from openai import OpenAI
# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
# Connect to SQLite database (or create it)
conn = sqlite3.connect("embeddings.db")
cursor = conn.cursor()
# Create table to store embeddings
cursor.execute('''
CREATE TABLE IF NOT EXISTS embeddings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
text TEXT NOT NULL,
embedding TEXT NOT NULL
)
''')
# Text to embed
text_to_embed = "OpenAI provides powerful embedding models."
# Generate embedding
response = client.embeddings.create(
model="text-embedding-3-large",
input=text_to_embed
)
embedding_vector = response.data[0].embedding
# Convert embedding vector to JSON string for storage
embedding_json = json.dumps(embedding_vector)
# Insert into database
cursor.execute(
"INSERT INTO embeddings (text, embedding) VALUES (?, ?)",
(text_to_embed, embedding_json)
)
conn.commit()
# Query and print stored embedding
cursor.execute("SELECT id, text, embedding FROM embeddings")
rows = cursor.fetchall()
for row in rows:
stored_id, stored_text, stored_embedding_json = row
stored_embedding = json.loads(stored_embedding_json)
print(f"ID: {stored_id}, Text: {stored_text}, Embedding vector length: {len(stored_embedding)}")
conn.close() output
ID: 1, Text: OpenAI provides powerful embedding models., Embedding vector length: 1536
Common variations
- Use PostgreSQL with
FLOAT8[]column type for native vector storage. - Store embeddings as binary blobs for efficiency.
- Use async OpenAI client calls for high throughput.
- Use different embedding models like
text-embedding-3-smallfor smaller vectors.
Troubleshooting
- If you get API key errors, verify
OPENAI_API_KEYis set correctly in your environment. - For database insertion errors, ensure your table schema matches the data types.
- If embeddings seem empty or incorrect, check the model name and API response structure.
Key Takeaways
- Use the OpenAI SDK's embeddings endpoint with a current model like
text-embedding-3-large. - Store embeddings as JSON strings or native array types depending on your database capabilities.
- Always secure your API key via environment variables and never hardcode it.
- SQLite is good for prototyping; use PostgreSQL or specialized vector DBs for production.
- Validate your database schema matches the embedding data format to avoid insertion errors.