Overview
Direct Answer
K-Nearest Neighbours (KNN) is a non-parametric, instance-based learning algorithm that classifies data points by identifying the k closest training examples in feature space and assigning the majority class label among those neighbours. Unlike parametric models, it makes no assumptions about underlying data distribution.
How It Works
The algorithm calculates distances (typically Euclidean or Manhattan) between a query point and all training samples, then selects the k nearest instances. Classification is determined by majority voting among these k neighbours; regression variants average their target values. Distance metric and k value selection directly influence model behaviour and accuracy.
Why It Matters
KNN remains valuable for rapid prototyping and problems with non-linear decision boundaries where linear assumptions fail. Its interpretability—decisions trace directly to nearest examples—supports explainability requirements in regulated sectors. Performance depends heavily on feature scaling and neighbourhood size, making it essential for baseline comparisons.
Common Applications
The method is widely deployed in recommendation systems, medical diagnosis support (identifying similar patient cases), credit scoring, and image recognition. Collaborative filtering systems use distance-based neighbour selection to suggest content, whilst spatial analysis applications leverage its natural handling of geometric relationships.
Key Considerations
Computational cost scales linearly with training set size since all distances must be calculated at prediction time, making it impractical for massive datasets without optimisation techniques like KD-trees or ball trees. Curse of dimensionality severely degrades performance in high-dimensional spaces where distance metrics become less meaningful.
More in Machine Learning
Label Noise
Feature Engineering & SelectionErrors or inconsistencies in the annotations of training data that can degrade model performance and lead to unreliable predictions if not properly addressed.
Online Learning
MLOps & ProductionA machine learning method where models are incrementally updated as new data arrives, rather than being trained in batch.
Deep Reinforcement Learning
Reinforcement LearningCombining deep neural networks with reinforcement learning to enable agents to learn complex decision-making from raw sensory input.
Reinforcement Learning
MLOps & ProductionA machine learning paradigm where agents learn optimal behaviour through trial and error, receiving rewards or penalties.
t-SNE
Unsupervised Learningt-Distributed Stochastic Neighbour Embedding — a technique for visualising high-dimensional data in two or three dimensions.
Curriculum Learning
Advanced MethodsA training strategy that presents examples to a model in a meaningful order, typically from easy to hard.
Ridge Regression
Training TechniquesA regularised regression technique that adds an L2 penalty term to prevent overfitting by constraining coefficient magnitudes.
Lasso Regression
Feature Engineering & SelectionA regularised regression technique that adds an L1 penalty, enabling feature selection by driving some coefficients to zero.