How to upsert vectors to Pinecone
Quick answer
To upsert vectors to
Pinecone, use the upsert method of the Index object from the pinecone-client Python SDK. Prepare your vectors as a list of tuples with unique IDs and vector embeddings, then call index.upsert(vectors=your_vectors) to insert or update them.PREREQUISITES
Python 3.8+Pinecone API keypip install pinecone-client>=2.2.0
Setup Pinecone environment
Install the Pinecone client library and set your API key as an environment variable. Initialize the Pinecone client and connect to your index.
import os
import pinecone
# Install Pinecone client if not installed
# pip install pinecone-client>=2.2.0
# Set your Pinecone API key in environment variable
# export PINECONE_API_KEY='your-api-key'
pinecone.init(api_key=os.environ["PINECONE_API_KEY"], environment="us-west1-gcp")
# Connect to an existing index or create one
index_name = "example-index"
if index_name not in pinecone.list_indexes():
pinecone.create_index(index_name, dimension=1536)
index = pinecone.Index(index_name) Step by step upsert vectors
Prepare your vectors as a list of tuples where each tuple contains a unique ID and a vector embedding (list of floats). Use the upsert method to insert or update these vectors in the index.
import numpy as np
# Example: create dummy vectors
vectors = [
("vec1", np.random.rand(1536).tolist()),
("vec2", np.random.rand(1536).tolist()),
("vec3", np.random.rand(1536).tolist())
]
# Upsert vectors to Pinecone index
upsert_response = index.upsert(vectors=vectors)
print("Upsert response:", upsert_response) output
Upsert response: {'upserted_count': 3} Common variations
- Use async upsert with
pinecone.AsyncIndexfor non-blocking calls. - Batch upserts for large datasets to avoid timeouts.
- Use metadata with vectors by passing a third element in the tuple:
(id, vector, metadata_dict).
import asyncio
async def async_upsert():
async_index = pinecone.AsyncIndex(index_name)
vectors = [("vec_async", np.random.rand(1536).tolist())]
response = await async_index.upsert(vectors=vectors)
print("Async upsert response:", response)
asyncio.run(async_upsert()) output
Async upsert response: {'upserted_count': 1} Troubleshooting common errors
- Index not found: Ensure the index exists or create it before upserting.
- Dimension mismatch: The vector dimension must match the index dimension.
- API key errors: Verify your
PINECONE_API_KEYenvironment variable is set correctly.
Key Takeaways
- Always initialize Pinecone client with your API key and environment before upserting.
- Vectors must be tuples of (id, embedding) with matching dimension to the index.
- Use batch or async upsert methods for large-scale or non-blocking operations.
- Include metadata with vectors to enhance retrieval capabilities.
- Check for common errors like dimension mismatch or missing index before upserting.