What is cosine similarity in AI
How it works
Cosine similarity calculates the cosine of the angle between two vectors, producing a value between -1 and 1. A value of 1 means the vectors point in the same direction (high similarity), 0 means they are orthogonal (no similarity), and -1 means they point in opposite directions. Imagine two arrows originating from the same point: the smaller the angle between them, the more similar their directions.
In AI, vectors often represent text or data features, so cosine similarity helps measure semantic closeness without being affected by vector length.
Concrete example
Given two vectors A and B, cosine similarity is computed as:
cosine_similarity = (A · B) / (||A|| * ||B||)
where · is the dot product and ||A|| is the Euclidean norm of vector A.
import numpy as np
# Example vectors
A = np.array([1, 2, 3])
B = np.array([4, 5, 6])
# Compute cosine similarity
dot_product = np.dot(A, B)
norm_A = np.linalg.norm(A)
norm_B = np.linalg.norm(B)
cosine_similarity = dot_product / (norm_A * norm_B)
print(f"Cosine similarity: {cosine_similarity:.4f}") Cosine similarity: 0.9746
When to use it
Use cosine similarity when you need to measure the similarity between two vectors representing text, images, or other features, especially when vector magnitude is irrelevant. It is ideal for tasks like document retrieval, recommendation systems, and clustering in AI.
Avoid cosine similarity when vector magnitude matters or when vectors can have negative values that affect interpretation.
Key terms
| Term | Definition |
|---|---|
| Cosine similarity | A measure of similarity between two vectors based on the cosine of the angle between them. |
| Dot product | An algebraic operation that multiplies corresponding entries of two vectors and sums the results. |
| Euclidean norm | The length or magnitude of a vector calculated as the square root of the sum of squared components. |
| Vector embedding | A numeric representation of data (like text) in a continuous vector space. |
Key Takeaways
- Cosine similarity measures how aligned two vectors are, ignoring their magnitude.
- It is widely used in AI for comparing text embeddings and feature vectors.
- Calculate it using the dot product divided by the product of vector norms.
- Ideal for semantic similarity tasks like search and recommendation.
- Not suitable when vector magnitude or direction sign is critical.