K-Means Clustering

Identify natural groups in your data.

Definition

K-means clustering partitions data into k homogeneous groups by minimizing within-group variance. It is the most widely used clustering algorithm in unsupervised learning.

When to use it

Segment a population into distinct profiles
Identify natural subgroups in data
Explore the structure of an unlabeled dataset

Requirements

Continuous variables
Standardized data recommended
Number of clusters k chosen by the user (selection aid provided)

What StatsLab computes

Assignment of each observation to a cluster
Cluster centroids
Within-cluster sum of squares (Within-SS)
Elbow plot to choose k
2D visualization of clusters (PCA)
Descriptive statistics per cluster

Worked example

Context : Segmenting 200 customers into groups based on purchasing behavior (frequency, amount, recency).

Result : 3 clusters identified: Loyal (n=68), Occasional (n=89), Inactive (n=43)

Interpretation : The elbow plot suggests k=3. Loyal customers spend 3× more than inactive ones. This segmentation guides differentiated marketing strategy.

Run this analysis