How to beginner · 3 min read

How to store and retrieve memories with Pinecone

Quick answer
Use Pinecone to store AI memories by embedding text with an embedding model (e.g., text-embedding-3-small) and upserting vectors into a Pinecone index. Retrieve memories by querying the index with a query embedding to find the most relevant stored vectors.

PREREQUISITES

  • Python 3.8+
  • Pinecone API key
  • OpenAI API key (for embeddings)
  • pip install openai>=1.0 pinecone-client

Setup

Install the required Python packages and set environment variables for your Pinecone and OpenAI API keys.

  • Install packages: pip install openai pinecone-client
  • Set environment variables: export PINECONE_API_KEY=your_pinecone_key and export OPENAI_API_KEY=your_openai_key
bash
pip install openai pinecone-client
output
Collecting openai
Collecting pinecone-client
Successfully installed openai-1.x.x pinecone-client-x.x.x

Step by step

This example shows how to embed text memories using OpenAI embeddings, store them in a Pinecone index, and retrieve relevant memories by querying with a new text input.

python
import os
from openai import OpenAI
import pinecone

# Initialize clients
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
pc = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create or connect to Pinecone index
index_name = "memory-index"
if index_name not in pc.list_indexes():
    pc.create_index(index_name, dimension=1536, metric="cosine")
index = pc.Index(index_name)

# Function to embed text

def embed_text(text: str) -> list:
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Store memories
memories = [
    ("id1", "Met Alice at the conference."),
    ("id2", "Discussed AI ethics with Bob."),
    ("id3", "Lunch meeting about project X.")
]

vectors = [(id, embed_text(text), {"text": text}) for id, text in memories]
index.upsert(vectors)
print("Memories stored in Pinecone index.")

# Retrieve memories
query = "Who did I talk about AI ethics with?"
query_embedding = embed_text(query)
results = index.query(vector=query_embedding, top_k=2, include_metadata=True)

print("Retrieved memories:")
for match in results.matches:
    print(f"- {match.metadata['text']} (score: {match.score:.4f})")
output
Memories stored in Pinecone index.
Retrieved memories:
- Discussed AI ethics with Bob. (score: 0.9123)
- Met Alice at the conference. (score: 0.6789)

Common variations

You can use asynchronous calls with asyncio for embedding and Pinecone operations. Also, you can switch embedding models or use other vector databases with similar APIs. For large-scale memory, batch upserts and queries improve performance.

python
import asyncio
from openai import OpenAI
import pinecone

async def async_embed_text(client, text: str) -> list:
    response = await client.embeddings.acreate(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

async def main():
    openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    pc = pinecone.Pinecone(api_key=os.environ["PINECONE_API_KEY"])
    index = pc.Index("memory-index")

    query = "Where did I meet Alice?"
    query_embedding = await async_embed_text(openai_client, query)
    results = index.query(vector=query_embedding, top_k=1, include_metadata=True)

    for match in results.matches:
        print(f"Async retrieved: {match.metadata['text']}")

asyncio.run(main())
output
Async retrieved: Met Alice at the conference.

Troubleshooting

  • If you get Index does not exist, ensure you created the index with the correct name and dimension.
  • If embeddings fail, verify your OpenAI API key and model name.
  • For slow queries, batch your upserts and queries or increase top_k carefully.

Key Takeaways

  • Use OpenAI embeddings to convert text memories into vectors for Pinecone storage.
  • Upsert vectors with metadata into a Pinecone index to store memories efficiently.
  • Query Pinecone with an embedding of your query text to retrieve relevant memories.
  • Batch operations and async calls improve performance for large memory sets.
  • Always verify index existence and API keys to avoid common errors.
Verified 2026-04 · text-embedding-3-small, gpt-4o
Verify ↗