KNN Nearest Neighbor Classification
No training needed, no formulas required—just look at who the neighbors are and go with the majority vote. That's KNN.
01 Core Principles (Plain English)
You just moved to a new city and don't know which nearby restaurant is good. How do you decide? — Ask your neighbors where they usually eat, and go wherever most of them go.
That's exactly what KNN does: Don't recognize this new point? Just look at the categories of its K nearest neighbors—whichever category has the most votes wins.
The Algorithm Has Only Three Steps
Calculate the distance from the new point to every training point (using straight-line distance, i.e., Euclidean distance).
Sort the distances from smallest to largest and pick the K closest points as "neighbors."
Whichever category is most common among the K neighbors, that's the predicted class for the new point.
KNN has no "training" phase—it simply stores all training data and only starts computing at prediction time. So training is fast, but prediction is slow (it has to compute all distances every time).
How to Choose K?
K Too Small (e.g., K=1)
Only looks at the single nearest neighbor—a single noisy point can swing the result. Prone to overfitting with jagged decision boundaries.
K Too Large
Too many neighbors means distant points also get to vote. Boundaries become overly smooth, potentially underfitting, and computation increases.
In practice, odd numbers (K=3, 5, 7) are commonly used to avoid ties, and cross-validation helps find the optimal K.
Build KNN Step by Step
Generate two spatially separated clusters and shuffle them so both classes interleave, simulating the messiness of real-world data.
KNN's core operation is just one thing: computing distances. The straight-line distance between two points is the Euclidean distance.
Sort all training points by distance, take the top K, count votes for each class, and the majority class is the prediction.
KNN has no training process—each time a batch of
points is added, just call
predict to redraw the decision
boundary and observe how it stabilizes as data
grows.
Combine all four segments with visualization, and you get the complete demo code—see below.
02 Code
03 Academic Explanation
Formal Definition
Given training set D = {(x₁,y₁), ..., (xₙ,yₙ)}, for a new sample x, KNN predicts:
where N_K(x) is the set of K nearest neighbors of x, and 𝟙 is the indicator function.
Distance Metrics
Standard KNN uses Euclidean distance (L₂ norm):
Manhattan distance (L₁), Minkowski distance (Lₚ), and others can also be used. If features have different scales, standardization is necessary—otherwise, features with larger values will dominate the distance.
Complexity
- Training: O(1)—just store the data, no computation needed
- Prediction: O(n·d)—each prediction requires traversing all training points, where n is the number of samples and d is the dimensionality
- Space: O(n·d)—must store the entire training set
For large-scale data, KD-Trees or Ball Trees can accelerate nearest neighbor search, reducing average query time to O(log n).
Decision Boundary
KNN's decision boundary is a composite form of Voronoi diagrams: when K=1, the boundary is the Voronoi boundary; as K increases, the boundary becomes smoother. Theoretically, as n→∞, the error rate of 1-NN is at most twice the Bayes optimal error rate.
Limitations
- Curse of Dimensionality: In high-dimensional spaces, distances tend to become equal, making the nearest neighbor concept ineffective
- Slow Prediction: Each prediction requires full computation over all data, unsuitable for real-time scenarios
- Sensitive to Noise: With small K, a single outlier can change the prediction result
- Class Imbalance: The majority class dominates the neighbors; weighted voting (by inverse distance) can help mitigate this