K-means clustering partitions data into k homogeneous groups by minimizing within-group variance. It is the most widely used clustering algorithm in unsupervised learning.
When to use it
Segment a population into distinct profiles
Identify natural subgroups in data
Explore the structure of an unlabeled dataset
Requirements
Continuous variables
Standardized data recommended
Number of clusters k chosen by the user (selection aid provided)
What StatsLab computes
Assignment of each observation to a cluster
Cluster centroids
Within-cluster sum of squares (Within-SS)
Elbow plot to choose k
2D visualization of clusters (PCA)
Descriptive statistics per cluster
Worked example
Context : Segmenting 200 customers into groups based on purchasing behavior (frequency, amount, recency).