Learning Rate — Technology Wiki

Overview

Direct Answer

Learning rate is a hyperparameter that determines the magnitude of parameter updates during gradient descent optimisation. It directly scales the gradient signal, controlling how aggressively the model adjusts weights at each training iteration.

How It Works

During backpropagation, the model computes the loss gradient with respect to each parameter. The optimiser multiplies this gradient by the learning rate before applying the update. A rate of 0.01 means parameters shift by 1% of the gradient magnitude; 0.001 by 0.1%. This multiplicative factor fundamentally controls convergence speed and stability.

Why It Matters

Selecting an appropriate value directly impacts training efficiency, final model accuracy, and computational cost. Rates that are too high cause divergence or oscillation around optimal weights; rates too low extend training time unnecessarily, increasing infrastructure expenses and time-to-deployment.

Common Applications

Applied across neural network training in computer vision, natural language processing, and time series forecasting. Practitioners routinely adjust this parameter when training image classifiers, language models, and recommendation systems to balance convergence speed against solution quality.

Key Considerations

Optimal values vary significantly across datasets, model architectures, and optimiser algorithms (SGD versus Adam require different ranges). Many practitioners employ learning rate schedules or adaptive methods that adjust the rate dynamically during training rather than using static values.

Referenced By1 term mentions Learning Rate

Other entries in the wiki whose definition references Learning Rate — useful for understanding how this concept connects across Machine Learning and adjacent domains.

Adam Optimiser·Machine Learning

Related in Training Techniques

Ridge Regression

A regularised regression technique that adds an L2 penalty term to prevent overfitting by constraining coefficient magnitudes.

Elastic Net

A regularisation technique combining L1 and L2 penalties, balancing feature selection and coefficient shrinkage.

Cross-Validation

A resampling technique that partitions data into subsets, training on some and validating on others to assess model generalisation.

Overfitting

When a model learns the training data too well, including noise, resulting in poor performance on unseen data.

Underfitting

When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data.

Bias-Variance Tradeoff

The balance between a model's ability to minimise bias (error from assumptions) and variance (sensitivity to training data fluctuations).

Regularisation

Techniques that add constraints or penalties to a model to prevent overfitting and improve generalisation to new data.

Gradient Descent

An optimisation algorithm that iteratively adjusts parameters in the direction of steepest descent of the loss function.

Stochastic Gradient Descent

A variant of gradient descent that updates parameters using a randomly selected subset of training data each iteration.

Adam Optimiser

An adaptive learning rate optimisation algorithm combining momentum and RMSProp for efficient deep learning training.

Loss Function

A mathematical function that measures the difference between predicted outputs and actual target values during model training.

Backpropagation

The algorithm for computing gradients of the loss function with respect to network weights, enabling neural network training.

More in Machine Learning

Unsupervised Learning

MLOps & Production

A machine learning approach where models discover patterns and structures in data without labelled examples.

Lasso Regression

Feature Engineering & Selection

A regularised regression technique that adds an L1 penalty, enabling feature selection by driving some coefficients to zero.

Collaborative Filtering

Unsupervised Learning

A recommendation technique that makes predictions based on the collective preferences and behaviour of many users.

UMAP

Unsupervised Learning

Uniform Manifold Approximation and Projection — a dimensionality reduction technique for visualisation and general non-linear reduction.

Bandit Algorithm

Advanced Methods

An online learning algorithm that balances exploration of new options with exploitation of known good options to maximise reward.

Model Serialisation

MLOps & Production

The process of converting a trained model into a format that can be stored, transferred, and later reconstructed for inference.

Ensemble Methods

MLOps & Production

Machine learning techniques that combine multiple models to produce better predictive performance than any single model, including bagging, boosting, and stacking approaches.

Clustering

Unsupervised Learning

Unsupervised learning technique that groups similar data points together based on inherent patterns without predefined labels.