How to Intermediate · 4 min read

How to use DSPy with Weaviate

Quick answer
Use dspy to define structured LLM tasks and combine it with the weaviate-client Python SDK to store and query embeddings in a Weaviate vector database. Generate embeddings with dspy and index them in Weaviate for efficient semantic search and retrieval.

PREREQUISITES

  • Python 3.8+
  • OpenAI API key (free tier works)
  • Weaviate instance running (cloud or local)
  • pip install dspy weaviate-client openai>=1.0

Setup

Install the required Python packages and set environment variables for your OpenAI API key and Weaviate endpoint. Ensure your Weaviate instance is running and accessible.

bash
pip install dspy weaviate-client openai>=1.0

Step by step

This example shows how to define a simple dspy signature for question-answering, generate embeddings using OpenAI via dspy, and store/query them in Weaviate.

python
import os
from openai import OpenAI
import dspy
import weaviate

# Initialize OpenAI client
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Configure DSPy to use OpenAI
lm = dspy.LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])
dspy.configure(lm=lm)

# Define a DSPy signature for QA
class QA(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

# Create a DSPy predictor
qa = dspy.Predict(QA)

# Initialize Weaviate client
client_wv = weaviate.Client(
    url=os.environ.get("WEAVIATE_URL", "http://localhost:8080"),
    additional_headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]}
)

# Define Weaviate schema for QA documents
class_obj = {
    "class": "QAItem",
    "vectorizer": "none",
    "properties": [
        {"name": "question", "dataType": ["text"]},
        {"name": "answer", "dataType": ["text"]}
    ]
}

# Create class if not exists
if not client_wv.schema.contains({"class": "QAItem"}):
    client_wv.schema.create_class(class_obj)

# Example question
question_text = "What is DSPy?"

# Generate answer using DSPy
result = qa(question=question_text)
print("Answer:", result.answer)

# Create embedding for question using OpenAI embeddings
embedding_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=question_text
)
embedding_vector = embedding_response.data[0].embedding

# Store question, answer, and embedding in Weaviate
client_wv.data_object.create(
    data_object={"question": question_text, "answer": result.answer},
    class_name="QAItem",
    vector=embedding_vector
)

# Query Weaviate for similar questions
near_vector = embedding_vector
response = client_wv.query.get("QAItem", ["question", "answer"]).with_near_vector({"vector": near_vector, "certainty": 0.7}).with_limit(3).do()

print("\nTop similar Q&A from Weaviate:")
for item in response["data"]["Get"]["QAItem"]:
    print(f"Q: {item['question']}")
    print(f"A: {item['answer']}\n")
output
Answer: DSPy is a declarative Python library for structured LLM programming.

Top similar Q&A from Weaviate:
Q: What is DSPy?
A: DSPy is a declarative Python library for structured LLM programming.

Common variations

  • Use async calls with weaviate-client for non-blocking queries.
  • Switch dspy to use Anthropic or other LLMs by changing the lm configuration.
  • Use different embedding models like text-embedding-3-large for higher quality vectors.
  • Integrate dspy chains for multi-step reasoning before storing results in Weaviate.

Troubleshooting

  • If you get connection errors, verify your WEAVIATE_URL and network access.
  • Ensure your OpenAI API key is valid and has embedding permissions.
  • If schema creation fails, check if the class already exists or if you have write permissions.
  • For embedding mismatches, confirm the embedding vector dimension matches your Weaviate instance configuration.

Key Takeaways

  • Use dspy to define structured LLM tasks and generate answers programmatically.
  • Leverage weaviate-client to store and query embeddings for semantic search.
  • Combine dspy outputs with Weaviate vectors for efficient AI-powered retrieval.
  • Always configure environment variables for API keys and endpoints securely.
  • Validate schema and vector dimensions before indexing data in Weaviate.
Verified 2026-04 · openai/gpt-4o-mini, text-embedding-3-small
Verify ↗