KNN - ML Easy Learning

01 Core Principles (Plain English)

You just moved to a new city and don't know which nearby restaurant is good. How do you decide? — Ask your neighbors where they usually eat, and go wherever most of them go.

That's exactly what KNN does: Don't recognize this new point? Just look at the categories of its K nearest neighbors—whichever category has the most votes wins.

The Algorithm Has Only Three Steps

Measure Distances

Calculate the distance from the new point to every training point (using straight-line distance, i.e., Euclidean distance).

Find Neighbors

Sort the distances from smallest to largest and pick the K closest points as "neighbors."

Vote

Whichever category is most common among the K neighbors, that's the predicted class for the new point.

KNN has no "training" phase—it simply stores all training data and only starts computing at prediction time. So training is fast, but prediction is slow (it has to compute all distances every time).

How to Choose K?

K Too Small (e.g., K=1)

Only looks at the single nearest neighbor—a single noisy point can swing the result. Prone to overfitting with jagged decision boundaries.

K Too Large

Too many neighbors means distant points also get to vote. Boundaries become overly smooth, potentially underfitting, and computation increases.

In practice, odd numbers (K=3, 5, 7) are commonly used to avoid ties, and cross-validation helps find the optimal K.

Build KNN Step by Step

Step 1 Generate Two-Class Training Data

Generate two spatially separated clusters and shuffle them so both classes interleave, simulating the messiness of real-world data.

Step 2 Euclidean Distance: Measure Nearness

KNN's core operation is just one thing: computing distances. The straight-line distance between two points is the Euclidean distance.

Step 3 Predict: Find Neighbors, Vote

Sort all training points by distance, take the top K, count votes for each class, and the majority class is the prediction.

Step 4 Add Points in Batches, Observe Boundary Changes

KNN has no training process—each time a batch of points is added, just call predict to redraw the decision boundary and observe how it stabilizes as data grows.

Combine all four segments with visualization, and you get the complete demo code—see below.

02 Code

03 Academic Explanation

Formal Definition

Given training set D = {(x₁,y₁), ..., (xₙ,yₙ)}, for a new sample x, KNN predicts:

$$\hat{y} = \arg\max_c \sum_{x_i \in N_K(x)} \mathbf{1}[y_i=c]$$

where N_K(x) is the set of K nearest neighbors of x, and 𝟙 is the indicator function.

Distance Metrics

Standard KNN uses Euclidean distance (L₂ norm):

$$d(x, x_i) = \sqrt{\sum_j (x_j - x_{ij})^2}$$

Manhattan distance (L₁), Minkowski distance (Lₚ), and others can also be used. If features have different scales, standardization is necessary—otherwise, features with larger values will dominate the distance.

Complexity

Training: O(1)—just store the data, no computation needed
Prediction: O(n·d)—each prediction requires traversing all training points, where n is the number of samples and d is the dimensionality
Space: O(n·d)—must store the entire training set

For large-scale data, KD-Trees or Ball Trees can accelerate nearest neighbor search, reducing average query time to O(log n).

Decision Boundary

KNN's decision boundary is a composite form of Voronoi diagrams: when K=1, the boundary is the Voronoi boundary; as K increases, the boundary becomes smoother. Theoretically, as n→∞, the error rate of 1-NN is at most twice the Bayes optimal error rate.

Limitations

Curse of Dimensionality: In high-dimensional spaces, distances tend to become equal, making the nearest neighbor concept ineffective
Slow Prediction: Each prediction requires full computation over all data, unsuitable for real-time scenarios
Sensitive to Noise: With small K, a single outlier can change the prediction result
Class Imbalance: The majority class dominates the neighbors; weighted voting (by inverse distance) can help mitigate this

KNN Nearest Neighbor Classification