Skip to content

Cosine Similarity

Medium — good to knowAI & ML

ELI5 — The Vibe Check

Cosine similarity measures how similar two things are by comparing the angles of their vectors. If two vectors point in the same direction, they're similar (score near 1). If they're perpendicular, they're unrelated (score near 0). If they point opposite directions, they're opposites (score near -1). It's the math behind 'these two documents are about the same topic.'

Real Talk

Cosine similarity computes the cosine of the angle between two vectors, measuring their directional alignment regardless of magnitude. It's defined as the dot product of two vectors divided by the product of their magnitudes, yielding a value between -1 and 1. It's the standard similarity metric for comparing text embeddings in semantic search and RAG systems.

Show Me The Code

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Example: compare two embeddings
sim = cosine_similarity(embed("dog"), embed("puppy"))  # ~0.92
sim = cosine_similarity(embed("dog"), embed("SQL"))    # ~0.15

When You'll Hear This

"The cosine similarity between these two documents is 0.94 — they're nearly identical." / "We use cosine similarity to find the most relevant chunks for RAG."

Made with passive-aggressive love by manoga.digital. Powered by Claude.