Bandit Algorithm — Technology Wiki

Overview

Direct Answer

A bandit algorithm is an online learning framework that sequentially selects actions to maximise cumulative reward by balancing exploration of unproven options against exploitation of known high-performing choices. It models decision-making under uncertainty where the learner receives feedback only on actions taken, not on counterfactuals.

How It Works

The algorithm maintains estimates of reward distributions for each action (arm) based on historical observations. At each decision step, it uses a selection strategy—such as epsilon-greedy, upper confidence bound (UCB), or Thompson sampling—to choose between exploring arms with uncertain payoffs and exploiting arms with high empirical performance. Reward feedback updates the estimates, refining future decisions.

Why It Matters

Organisations deploy bandit approaches to optimise resource allocation under uncertainty without exhaustive pre-experimentation. Applications drive measurable improvements in conversion rates, customer engagement, and cost efficiency by reducing regret (cumulative suboptimal choices) in dynamic environments where conditions evolve over time.

Common Applications

Use cases include A/B testing in digital products, real-time ad placement optimisation, clinical trial design with adaptive allocation, recommendation system ranking, and network routing. These domains benefit from algorithms that learn which option performs best whilst minimising exposure to poor choices.

Key Considerations

Practitioners must account for exploration-exploitation tradeoffs: excessive exploration wastes resources on inferior options; insufficient exploration risks converging to suboptimal solutions. Context switching costs, non-stationary reward distributions, and the assumption of independence between arms can significantly impact real-world performance.

Cross-References(1)

Machine Learning

Online Learning

Related in Advanced Methods

Semi-Supervised Learning

A learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.

Self-Supervised Learning

A learning paradigm where models generate their own supervisory signals from unlabelled data through pretext tasks.

Transfer Learning

A technique where knowledge gained from training on one task is applied to a different but related task.

Meta-Learning

Learning to learn — algorithms that improve their learning process by leveraging experience from multiple learning episodes.

Curriculum Learning

A training strategy that presents examples to a model in a meaningful order, typically from easy to hard.

Bagging

Bootstrap Aggregating — an ensemble method that trains multiple models on random subsets of data and averages their predictions.

More in Machine Learning

Ensemble Methods

MLOps & Production

Machine learning techniques that combine multiple models to produce better predictive performance than any single model, including bagging, boosting, and stacking approaches.

A/B Testing

Training Techniques

A controlled experiment comparing two variants to determine which performs better against a defined metric.

Hierarchical Clustering

Unsupervised Learning

A clustering method that builds a tree-like hierarchy of clusters through successive merging or splitting of groups.

Content-Based Filtering

Unsupervised Learning

A recommendation approach that suggests items similar to those a user has previously liked, based on item attributes.

Continual Learning

MLOps & Production

A machine learning paradigm where models learn from a continuous stream of data, accumulating knowledge over time without forgetting previously learned information.

Adam Optimiser

Training Techniques

An adaptive learning rate optimisation algorithm combining momentum and RMSProp for efficient deep learning training.

Model Monitoring

MLOps & Production

Continuous observation of deployed machine learning models to detect performance degradation, data drift, anomalous predictions, and infrastructure issues in production.

Model Serialisation

MLOps & Production

The process of converting a trained model into a format that can be stored, transferred, and later reconstructed for inference.