01 Core Principles (Plain English)

Imagine walking into a candy store where hundreds of candies are scattered across a table. The owner asks you to sort them into groups, but doesn't tell you how — what would you do?

Intuitively, you'd put candies that look alike together: similar colors, similar sizes... That's exactly what K-Means does.

The Algorithm Has Only Four Steps

1
Plant Random Flags

Randomly select K points from the data and plant flags — these are the K "captains" (centroids).

2
Each Point Finds Its Captain

Each data point looks at which captain is closest, and follows that one.

3
Captains Relocate

Each captain moves to the center of its team members (taking the average), positioning itself more "centrally".

4
Repeat Until Stable

Repeat steps 2 and 3. The captains move less and less, eventually settling — clustering is complete.

Why does it converge? After each "captain relocation", the total "travel distance" (sum of distances from each point to its captain) only decreases, never increases. When it can't decrease anymore, it stops.

Two Pitfalls to Watch Out For

You Must Choose K Yourself

You have to tell the algorithm how many groups to create. If K is too small, different categories get mixed together; if K is too large, the same category gets split apart. The "Elbow Method" is commonly used to find a good K.

Initial Positions Affect Results

Different random starting positions can lead to different final results, sometimes getting stuck in local optima. In practice, it's common to run multiple times and take the best result.

Building K-Means Step by Step

Step 1 Generate Data + Plant Random Flags

Generate three natural clusters, then randomly pick K points from the data as initial centroids. Different starting positions may lead to different final results.

Step 2 Assign: Each Point Finds Its Captain

Each point calculates its distance to all centroids and joins the nearest one. This is the core operation of K-Means.

Step 3 Update: Captains Relocate

Each centroid moves to the mean position of all its members. After each move, the total distance only decreases, guaranteeing eventual convergence.

Step 4 Iterate Until Convergence

Repeatedly execute "assign + update", stopping when centroids no longer move. Usually only a few rounds are needed to converge.

Combine all four segments together, add visualization, and you have the complete demo code — see below.

02 Code

03 Academic Explanation

Formal Definition

Given a dataset X = {x₁, x₂, ..., xₙ}, K-Means aims to find K clusters C = {C₁, C₂, ..., C_K} and their corresponding centroids μ₁, ..., μ_K that minimize the objective function:

$$J = \sum_k \sum_{x \in C_k} \|x - \mu_k\|^2$$

Where ‖x − μ_k‖² is the squared Euclidean distance, and J is also called the Within-Cluster Sum of Squares (WCSS / SSE).

Algorithm Pseudocode

Initialize: randomly select K centroids μ₁, ..., μ_K Repeat: Assignment step: for each xᵢ, set cᵢ = argmin_k ‖xᵢ − μ_k‖² Update step: set μ_k = (1/|C_k|) Σ_{i:cᵢ=k} xᵢ Until centroids no longer change

Convergence

After each iteration, the objective function J is monotonically non-increasing, and the state space is finite (there are finitely many assignment schemes), so the algorithm is guaranteed to converge — but only to a local optimum, not necessarily the global optimum.

In practice, K-Means++ initialization is commonly used: the next centroid is selected with probability proportional to d(x, nearest selected centroid)², significantly reducing the chance of getting stuck in poor local optima.

Distance Metric

$$d(x,\mu) = \sqrt{\sum_i (x_i - \mu_i)^2}$$

Standard K-Means uses Euclidean distance. If the data dimensions have very different scales, standardization (z-score) should be applied first, otherwise dimensions with larger scales will dominate the distance calculation.

Choosing K

Elbow Method

Plot the relationship between K and SSE. The inflection point where SSE's decrease slows down is a good choice for K.

Silhouette Score

Measures intra-cluster cohesion and inter-cluster separation. Ranges from [−1, 1] — the closer to 1, the better.

Business Constraints

In real-world scenarios, K is often determined by business requirements (e.g., segmenting users into 3 tiers).

Complexity

  • Time Complexity: O(n · K · d · T), where n is the number of samples, d is the number of dimensions, and T is the number of iterations
  • Space Complexity: O(n · d + K · d)
  • For large-scale data, Mini-Batch K-Means can be used, updating centroids with random subsets each time for significant speedup.

Limitations

  • Assumes convex-shaped clusters — performs poorly on non-convex distributions like rings or crescents
  • Sensitive to outliers (outliers can strongly pull centroids off course)
  • Requires specifying K in advance, and results depend on initialization