How to use text-embedding-3-small in python
Direct answer
Use the OpenAI Python SDK's
client.embeddings.create method with model="text-embedding-3-small" and your input text to generate embeddings.Setup
Install
pip install openai Env vars
OPENAI_API_KEY Imports
import os
from openai import OpenAI Examples
inThe quick brown fox jumps over the lazy dog.
out[0.0123, -0.0456, 0.0789, ...] # vector of floats representing the embedding
inHow to integrate OpenAI embeddings in Python?
out[0.0345, -0.0234, 0.0567, ...] # embedding vector for the query
in
out[] # empty input returns an empty or error response depending on API validation
Integration steps
- Install the OpenAI Python SDK and set your OPENAI_API_KEY environment variable.
- Import the OpenAI client and initialize it with your API key from os.environ.
- Call the embeddings.create method with model='text-embedding-3-small' and input text.
- Receive the response containing the embedding vector.
- Extract the embedding vector from response.data[0].embedding for use in your application.
Full code
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.embeddings.create(
model="text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog."
)
embedding_vector = response.data[0].embedding
print("Embedding vector:", embedding_vector) output
Embedding vector: [0.012345, -0.045678, 0.078912, ...]
API trace
Request
{"model": "text-embedding-3-small", "input": "The quick brown fox jumps over the lazy dog."} Response
{"data": [{"embedding": [0.012345, -0.045678, 0.078912, ...], "index": 0}], "usage": {"prompt_tokens": 9, "total_tokens": 9}} Extract
response.data[0].embeddingVariants
Async Embedding Request ›
Use when you want to perform embedding requests concurrently or integrate with async frameworks.
import os
import asyncio
from openai import OpenAI
async def main():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = await client.embeddings.acreate(
model="text-embedding-3-small",
input="Async call example text."
)
embedding = response.data[0].embedding
print("Async embedding vector:", embedding)
asyncio.run(main()) Batch Embedding Multiple Inputs ›
Use to embed multiple texts in a single API call for efficiency.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
texts = [
"First text to embed.",
"Second text for embedding.",
"Third example input."
]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [item.embedding for item in response.data]
for i, emb in enumerate(embeddings):
print(f"Embedding {i}:", emb) Alternative Model: text-embedding-3-large ›
Use when you need higher quality embeddings at the cost of higher latency and compute.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.embeddings.create(
model="text-embedding-3-large",
input="Using a larger embedding model for better accuracy."
)
embedding = response.data[0].embedding
print("Large model embedding vector:", embedding) Performance
Latency~300-500ms per embedding call for single input on text-embedding-3-small
Cost~$0.0004 per 1,000 tokens for text-embedding-3-small
Rate limitsTier 1: 600 RPM / 60K TPM
- Preprocess text to remove unnecessary whitespace or stopwords to reduce token count.
- Batch multiple inputs to amortize overhead per call.
- Use smaller embedding models for less critical tasks to save cost.
| Approach | Latency | Cost/call | Best for |
|---|---|---|---|
| Single input embedding | ~300-500ms | ~$0.0004 | Quick embeddings for one text |
| Batch embedding multiple texts | ~500-800ms | ~$0.0004 per 1k tokens total | Efficient bulk embedding |
| Using text-embedding-3-large | ~700-1000ms | ~$0.0012 | High quality embeddings with higher cost |
Quick tip
Always batch multiple texts in a single embeddings request to reduce latency and cost.
Common mistake
Passing an empty string or non-string input to the embeddings endpoint causes errors or empty embeddings.