How to beginner · 4 min read

How to query Pinecone index in python

Quick answer
To query a Pinecone index in Python, use the official pinecone-client SDK to connect to your index, then call the query method with your query vector and parameters like top_k. This returns the most similar vectors and their metadata for retrieval augmented generation (RAG) workflows.

PREREQUISITES

  • Python 3.8+
  • Pinecone API key
  • pip install pinecone-client
  • A Pinecone index with vectors already upserted

Setup Pinecone client

Install the Pinecone Python client and initialize it with your API key and environment. This prepares your Python environment to interact with your Pinecone index.

python
import os
import pinecone

# Install pinecone-client if not installed
# pip install pinecone-client

# Initialize Pinecone client
pinecone.init(api_key=os.environ["PINECONE_API_KEY"], environment="us-west1-gcp")

# Connect to your existing index
index = pinecone.Index("your-index-name")

Query the Pinecone index

Use the query method on the index object to find the top-k most similar vectors to your query vector. The query vector is typically generated by an embedding model like OpenAI's or any other.

python
import numpy as np

# Example query vector (replace with your actual embedding)
query_vector = np.random.rand(1536).tolist()  # e.g., 1536-dim OpenAI embedding

# Query the index for top 5 matches
response = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True
)

print("Query results:")
for match in response['matches']:
    print(f"ID: {match['id']}, Score: {match['score']}, Metadata: {match.get('metadata')}")
output
Query results:
ID: vector123, Score: 0.92, Metadata: {'text': 'Example document snippet'}
ID: vector456, Score: 0.89, Metadata: {'text': 'Another snippet'}
...

Common variations

  • Use async querying with pinecone.AsyncIndex for non-blocking calls.
  • Adjust top_k to control the number of results returned.
  • Include include_values=True to get the stored vector values in the response.
  • Use different embedding models to generate query vectors.
python
import asyncio

async def async_query_example():
    async_index = pinecone.AsyncIndex("your-index-name")
    query_vector = np.random.rand(1536).tolist()
    response = await async_index.query(
        vector=query_vector,
        top_k=3,
        include_metadata=True
    )
    print(response)

# Run async example
asyncio.run(async_query_example())

Troubleshooting common issues

  • If you get a 403 Forbidden error, verify your Pinecone API key and environment.
  • If index.query returns empty matches, check that your index has vectors upserted and the query vector dimension matches the index dimension.
  • For connection errors, ensure your network allows outbound HTTPS requests to Pinecone endpoints.

Key Takeaways

  • Initialize Pinecone client with your API key and environment before querying.
  • Use the query method with a vector and top_k to retrieve similar vectors.
  • Include metadata in query results to get contextual information for RAG.
  • Async querying is available for scalable, non-blocking applications.
  • Verify vector dimensions and index status if queries return no results.
Verified 2026-04 · gpt-4o, pinecone-client
Verify ↗