Overview
Direct Answer
Gradient boosting is an ensemble machine learning method that constructs models sequentially, with each successive model trained to predict and correct the residual errors left by the combined predictions of all previous models. The technique uses gradient descent optimisation to minimise a loss function across iterations.
How It Works
The algorithm initialises with a base learner, then iteratively fits new weak learners (typically decision trees) to the negative gradient of the loss function computed on training data. Each new model is weighted and added to the ensemble, with subsequent models focusing on instances or residuals where the existing ensemble performs poorly. Learning rates control the contribution of each addition, balancing model complexity against convergence speed.
Why It Matters
Gradient boosting achieves state-of-the-art predictive accuracy across classification and regression tasks, often outperforming alternative ensemble methods on tabular data. Organisations deploy it for high-stakes applications requiring robust generalisation, including credit risk assessment, fraud detection, and customer churn prediction, where incremental accuracy improvements directly translate to measurable business value.
Common Applications
Applications span financial services for loan default prediction, e-commerce for demand forecasting, healthcare for patient outcome modelling, and insurance for claims assessment. XGBoost, LightGBM, and CatBoost represent widely adopted open-source implementations used across industries for competition benchmarks and production systems.
Key Considerations
The sequential training process is computationally expensive and harder to parallelise than batch ensemble methods. Practitioners must carefully tune hyperparameters including learning rate, tree depth, and iteration count to avoid overfitting whilst maintaining interpretability.
Referenced By2 terms mention Gradient Boosting
Other entries in the wiki whose definition references Gradient Boosting — useful for understanding how this concept connects across Machine Learning and adjacent domains.
More in Machine Learning
Bagging
Advanced MethodsBootstrap Aggregating — an ensemble method that trains multiple models on random subsets of data and averages their predictions.
Overfitting
Training TechniquesWhen a model learns the training data too well, including noise, resulting in poor performance on unseen data.
Ridge Regression
Training TechniquesA regularised regression technique that adds an L2 penalty term to prevent overfitting by constraining coefficient magnitudes.
Unsupervised Learning
MLOps & ProductionA machine learning approach where models discover patterns and structures in data without labelled examples.
Meta-Learning
Advanced MethodsLearning to learn — algorithms that improve their learning process by leveraging experience from multiple learning episodes.
DBSCAN
Unsupervised LearningDensity-Based Spatial Clustering of Applications with Noise — a clustering algorithm that finds arbitrarily shaped clusters based on density.
Semi-Supervised Learning
Advanced MethodsA learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.
Feature Selection
MLOps & ProductionThe process of identifying and selecting the most relevant input variables for a machine learning model.