Regularisation — Technology Wiki

Overview

Direct Answer

Regularisation refers to a set of mathematical techniques that impose penalties on model complexity during training, constraining weight magnitudes or feature counts to reduce overfitting. By adding a regularisation term to the loss function, models learn simpler representations that generalise better to unseen data.

How It Works

Regularisation modifies the objective function by appending a penalty proportional to model parameters—typically L1 (sum of absolute weights) or L2 (sum of squared weights). During optimisation, the algorithm balances minimising training error against minimising this penalty, effectively shrinking less important weights towards zero and limiting the model's capacity to memorise noise.

Why It Matters

Overfitted models exhibit poor performance on production data despite strong training metrics, directly reducing business value and increasing deployment risk. Regularisation significantly improves model robustness and predictive reliability in real-world scenarios where training and operational data distributions diverge, lowering the cost of model retraining and failure mitigation.

Common Applications

Regularisation is standard in credit risk assessment, customer churn prediction, and medical image classification where high accuracy on held-out test sets is critical. L2 regularisation appears ubiquitously in regression and neural network training; L1 regularisation is preferred for feature selection in high-dimensional datasets such as genomics and financial forecasting.

Key Considerations

Selecting appropriate regularisation strength requires careful tuning via cross-validation; excessively strong penalties bias models towards underfitting and reduced discriminative power. The choice between L1 and L2 depends on whether feature sparsity or smooth weight decay is desired.

Cross-References(1)

Machine Learning

Overfitting

Referenced By4 terms mention Regularisation

Other entries in the wiki whose definition references Regularisation — useful for understanding how this concept connects across Machine Learning and adjacent domains.

Dropout·Deep Learning Elastic Net·Machine Learning Tabular Deep Learning·Machine Learning Weight Decay·Deep Learning

Related in Training Techniques

Ridge Regression

A regularised regression technique that adds an L2 penalty term to prevent overfitting by constraining coefficient magnitudes.

Elastic Net

A regularisation technique combining L1 and L2 penalties, balancing feature selection and coefficient shrinkage.

Cross-Validation

A resampling technique that partitions data into subsets, training on some and validating on others to assess model generalisation.

Overfitting

When a model learns the training data too well, including noise, resulting in poor performance on unseen data.

Underfitting

When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data.

Bias-Variance Tradeoff

The balance between a model's ability to minimise bias (error from assumptions) and variance (sensitivity to training data fluctuations).

Gradient Descent

An optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.

Stochastic Gradient Descent

A variant of gradient descent that updates parameters using a randomly selected subset of training data each iteration.

Adam Optimiser

An adaptive learning rate optimisation algorithm combining momentum and RMSProp for efficient deep learning training.

Learning Rate

A hyperparameter that controls how much model parameters are adjusted with respect to the loss gradient during training.

Loss Function

A mathematical function that measures the difference between predicted outputs and actual target values during model training.

Backpropagation

The algorithm for computing gradients of the loss function with respect to network weights, enabling neural network training.

More in Machine Learning

K-Nearest Neighbours

Supervised Learning

A simple algorithm that classifies data points based on the majority class of their k closest neighbours in feature space.

Logistic Regression

Supervised Learning

A classification algorithm that models the probability of a binary outcome using a logistic function.

XGBoost

Supervised Learning

An optimised distributed gradient boosting library designed for speed and performance in machine learning competitions and production.

Deep Reinforcement Learning

Reinforcement Learning

Combining deep neural networks with reinforcement learning to enable agents to learn complex decision-making from raw sensory input.

Batch Learning

MLOps & Production

Training a machine learning model on the entire dataset at once before deployment, as opposed to incremental updates.

Active Learning

MLOps & Production

A machine learning approach where the algorithm interactively queries a user or oracle to label new data points.

Anomaly Detection

Anomaly & Pattern Detection

Identifying data points, events, or observations that deviate significantly from the expected pattern in a dataset.

Hierarchical Clustering

Unsupervised Learning

A clustering method that builds a tree-like hierarchy of clusters through successive merging or splitting of groups.