API Intermediate medium · 6 min

Similarity calculation

What you will learn

Use the Gemini embeddings API to compute semantic similarity between text passages by converting them to vector representations and measuring their distance.

Why this matters

Similarity scoring is foundational for retrieval-augmented generation (RAG), semantic search, deduplication, and clustering: but raw embedding vectors are meaningless without a distance metric; understanding which metric to use and how Gemini's embeddings behave prevents costly trial-and-error in production.

Skip if: If you only need categorical matching (exact keywords or regex), use string comparison instead. If you're comparing images, use vision embeddings or a specialized image similarity service. If you need real-time similarity on billions of vectors, use a vector database (Pinecone, Weaviate) instead of computing similarity on-the-fly.

Explanation

What it does: The Gemini embeddings API converts text into high-dimensional dense vectors (currently 768 dimensions for the standard model). You then calculate similarity by measuring the distance between two vectors: typically cosine similarity for semantic text matching. How it works: Gemini's embedding model is trained on diverse web text and produces vectors where semantically similar passages cluster together. Cosine similarity measures the angle between vectors (0 = orthogonal/unrelated, 1 = identical), while Euclidean distance measures absolute separation. The API charges per 1,000 tokens embedded, making batch embedding cheaper than per-query computation. When to use it: Use embeddings + similarity for fuzzy text matching, semantic search across documents, pairing user queries to knowledge bases, or detecting near-duplicate content. This is superior to keyword matching because it understands meaning, not just surface-level tokens.

Request code

python

import google.generativeai as genai
import os
from math import sqrt

genai.configure(api_key=os.environ['GOOGLE_API_KEY'])

def cosine_similarity(vec_a, vec_b):
    dot_product = sum(a * b for a, b in zip(vec_a, vec_b))
    mag_a = sqrt(sum(a * a for a in vec_a))
    mag_b = sqrt(sum(b * b for b in vec_b))
    if mag_a == 0 or mag_b == 0:
        return 0.0
    return dot_product / (mag_a * mag_b)

model = genai.GenerativeModel('embedding-001')

text_1 = 'The quick brown fox jumps over the lazy dog'
text_2 = 'A fast auburn fox leaps across a sleepy canine'
text_3 = 'Machine learning algorithms require large datasets'

embedding_1 = genai.embed_content(
    model='models/embedding-001',
    content=text_1
)['embedding']

embedding_2 = genai.embed_content(
    model='models/embedding-001',
    content=text_2
)['embedding']

embedding_3 = genai.embed_content(
    model='models/embedding-001',
    content=text_3
)['embedding']

similarity_1_2 = cosine_similarity(embedding_1, embedding_2)
similarity_1_3 = cosine_similarity(embedding_1, embedding_3)

print(f'Similarity (text_1 vs text_2): {similarity_1_2:.4f}')
print(f'Similarity (text_1 vs text_3): {similarity_1_3:.4f}')

Authentication

Set the GOOGLE_API_KEY environment variable before running code. Get your API key from Google AI Studio (https://aistudio.google.com/app/apikey). The google-generativeai library reads this automatically on genai.configure().

Response shape

Field	Description
`embedding`	List of floats representing the 768-dimensional vector
`model`	String identifying the embedding model used (e.g., 'models/embedding-001')

Field guide

embedding

The raw vector itself: this is what you feed into your similarity calculation. Each float has roughly 4 decimal places of precision.

model

Confirms which embedding model was used; important for reproducibility since different model versions may produce incompatible vectors

Setup trap

The Gemini embeddings API requires explicit models/embedding-001 in the model parameter: passing just 'embedding-001' or using a GenerativeModel object initialized with embedding-001 will fail silently or throw a mismatch error. The exact parameter is model='models/embedding-001' in the genai.embed_content() call.

Cost

Embedding costs $0.02 per 1M tokens (as of April 2026). A typical document chunk (500 tokens) costs ~$0.00001. Batch-embedding 10,000 documents (5M tokens total) costs ~$0.10. This is cheap, but on-the-fly embedding every user query in a high-traffic app can exceed budget; cache embeddings or pre-compute indexes instead.

Rate limits

The Gemini API allows 1,500 requests per minute on the free tier. If you're embedding large batches, send requests sequentially or batch multiple texts into a single request using the <code>batch_size</code> parameter (available in newer SDK versions). Exceeding limits returns HTTP 429 (Too Many Requests).

Common gotcha

Embedding models produce different vector dimensions and spaces across versions. If you embed documents with embedding-001 today but switch to a newer model in 6 months, your old vectors become incomparable with new ones: you must re-embed everything. Always store the model version alongside your vectors in production.

Error recovery

google.api_core.exceptions.InvalidArgument

Usually means <code>model</code> parameter is malformed. Use exactly <code>'models/embedding-001'</code>, not variations.

google.api_core.exceptions.PermissionDenied

API key is missing, invalid, or lacks embedding permissions. Regenerate the key in Google AI Studio.

google.api_core.exceptions.ResourceExhausted

Rate limit exceeded. Wait a few seconds and retry with exponential backoff.

ValueError from cosine_similarity

One or both vectors are zero-length, typically from empty input text. Pre-validate that text is non-empty before embedding.

Experienced dev note

Always normalize your similarity scores to a [0, 1] range and set a threshold for your use case: a similarity of 0.75 might mean 'duplicate' in one context but 'loosely related' in another. More importantly, embed once, store forever: pre-compute embeddings for your knowledge base (documents, FAQs, product catalog) and store them in a vector DB or even a simple JSON file. At query time, embed only the user input and search: this cuts costs by 10–100x and improves latency from seconds to milliseconds.

Check your understanding

You have 100,000 customer support tickets. You want to find duplicates. Why would embedding all tickets once and storing the vectors be cheaper than re-embedding on every query, even if queries are infrequent?

Show answer hint

Cost is per token embedded, not per similarity calculation. Embedding 100,000 × average_tokens once is cheaper than embedding those same 100,000 texts multiple times across subsequent searches.

VERSION google-generativeai >= 0.8.0 supports the current genai.embed_content() API. Older versions (0.3–0.7) used genai.generate_embeddings() which is now deprecated. Always update to 0.8.x or later for embeddings work.

Community Notes

No notes yetBe the first to share a version-specific fix or tip.