Overfitting — Technology Wiki

Overview

When a model learns the training data too well, including noise, resulting in poor performance on unseen data.

Related in Training Techniques

Ridge Regression

A regularised regression technique that adds an L2 penalty term to prevent overfitting by constraining coefficient magnitudes.

Elastic Net

A regularisation technique combining L1 and L2 penalties, balancing feature selection and coefficient shrinkage.

Cross-Validation

A resampling technique that partitions data into subsets, training on some and validating on others to assess model generalisation.

Underfitting

When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data.

Bias-Variance Tradeoff

The balance between a model's ability to minimise bias (error from assumptions) and variance (sensitivity to training data fluctuations).

Regularisation

Techniques that add constraints or penalties to a model to prevent overfitting and improve generalisation to new data.

Gradient Descent

An optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.

Stochastic Gradient Descent

A variant of gradient descent that updates parameters using a randomly selected subset of training data each iteration.

Adam Optimiser

An adaptive learning rate optimisation algorithm combining momentum and RMSProp for efficient deep learning training.

Learning Rate

A hyperparameter that controls how much model parameters are adjusted with respect to the loss gradient during training.

Loss Function

A mathematical function that measures the difference between predicted outputs and actual target values during model training.

Backpropagation

The algorithm for computing gradients of the loss function with respect to network weights, enabling neural network training.

More in Machine Learning

Content-Based Filtering

Unsupervised Learning

A recommendation approach that suggests items similar to those a user has previously liked, based on item attributes.

Model Serialisation

MLOps & Production

The process of converting a trained model into a format that can be stored, transferred, and later reconstructed for inference.

Feature Store

MLOps & Production

A centralised repository for storing, managing, and serving machine learning features, ensuring consistency between training and inference environments across an organisation.

Class Imbalance

Feature Engineering & Selection

A situation where the distribution of classes in a dataset is significantly skewed, with some classes vastly outnumbering others.

Model Calibration

MLOps & Production

The process of adjusting a model's predicted probabilities so they accurately reflect the true likelihood of outcomes, essential for risk-sensitive decision-making.

K-Means Clustering

Unsupervised Learning

A partitioning algorithm that divides data into k clusters by minimising the distance between points and their cluster centroids.

Multi-Task Learning

MLOps & Production

A machine learning approach where a model is simultaneously trained on multiple related tasks to improve generalisation.

Experiment Tracking

MLOps & Production

The systematic recording of machine learning experiment parameters, metrics, artifacts, and code versions to enable reproducibility and comparison across training runs.