Overview
Direct Answer
Regularisation refers to a set of mathematical techniques that impose penalties on model complexity during training, constraining weight magnitudes or feature counts to reduce overfitting. By adding a regularisation term to the loss function, models learn simpler representations that generalise better to unseen data.
How It Works
Regularisation modifies the objective function by appending a penalty proportional to model parameters—typically L1 (sum of absolute weights) or L2 (sum of squared weights). During optimisation, the algorithm balances minimising training error against minimising this penalty, effectively shrinking less important weights towards zero and limiting the model's capacity to memorise noise.
Why It Matters
Overfitted models exhibit poor performance on production data despite strong training metrics, directly reducing business value and increasing deployment risk. Regularisation significantly improves model robustness and predictive reliability in real-world scenarios where training and operational data distributions diverge, lowering the cost of model retraining and failure mitigation.
Common Applications
Regularisation is standard in credit risk assessment, customer churn prediction, and medical image classification where high accuracy on held-out test sets is critical. L2 regularisation appears ubiquitously in regression and neural network training; L1 regularisation is preferred for feature selection in high-dimensional datasets such as genomics and financial forecasting.
Key Considerations
Selecting appropriate regularisation strength requires careful tuning via cross-validation; excessively strong penalties bias models towards underfitting and reduced discriminative power. The choice between L1 and L2 depends on whether feature sparsity or smooth weight decay is desired.
Cross-References(1)
Referenced By4 terms mention Regularisation
Other entries in the wiki whose definition references Regularisation — useful for understanding how this concept connects across Machine Learning and adjacent domains.
More in Machine Learning
K-Nearest Neighbours
Supervised LearningA simple algorithm that classifies data points based on the majority class of their k closest neighbours in feature space.
Logistic Regression
Supervised LearningA classification algorithm that models the probability of a binary outcome using a logistic function.
XGBoost
Supervised LearningAn optimised distributed gradient boosting library designed for speed and performance in machine learning competitions and production.
Deep Reinforcement Learning
Reinforcement LearningCombining deep neural networks with reinforcement learning to enable agents to learn complex decision-making from raw sensory input.
Batch Learning
MLOps & ProductionTraining a machine learning model on the entire dataset at once before deployment, as opposed to incremental updates.
Active Learning
MLOps & ProductionA machine learning approach where the algorithm interactively queries a user or oracle to label new data points.
Anomaly Detection
Anomaly & Pattern DetectionIdentifying data points, events, or observations that deviate significantly from the expected pattern in a dataset.
Hierarchical Clustering
Unsupervised LearningA clustering method that builds a tree-like hierarchy of clusters through successive merging or splitting of groups.