Clustering
ELI5 — The Vibe Check
Clustering is teaching an AI to find groups in data WITHOUT being told what the groups are. You dump in a million customer records and say 'find the natural groups.' The algorithm discovers that there are 5 types of customers you never knew about. It's unsupervised — nobody tells it what to look for.
Real Talk
Clustering is an unsupervised learning task that groups data points by similarity without predefined labels. Common algorithms include K-means, DBSCAN, hierarchical clustering, and Gaussian Mixture Models. It is used for customer segmentation, anomaly detection, and data exploration. Evaluation is less straightforward than supervised tasks.
Show Me The Code
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_ # which cluster each point belongs to
When You'll Hear This
"We used clustering to discover customer segments." / "Clustering found 3 natural groups in the user behavior data."
Related Terms
Classification
Classification is teaching an AI to sort things into categories. Is this email spam or not? Is this image a cat, dog, or bird?
Embedding
An embedding is turning words, sentences, or entire documents into lists of numbers (vectors) that capture their meaning.
Regression
Regression is like classification but instead of sorting things into categories, you're predicting a number. What will this house sell for?