Overview
Direct Answer
A mini-batch is a small, fixed-size subset of training data used to compute a single gradient update during iterative optimisation. It represents a practical compromise between processing individual samples (stochastic gradient descent) and the entire dataset (batch gradient descent).
How It Works
During each training iteration, a mini-batch of typically 32 to 512 samples is selected from the training dataset. The model computes predictions for all samples in the subset, calculates the loss across those samples, and backpropagates to produce a single gradient estimate. This aggregated gradient is used to update model weights before the next mini-batch is processed.
Why It Matters
Mini-batches enable efficient hardware utilisation by vectorising computations across multiple samples simultaneously, reducing training time substantially on GPUs and TPUs. They also provide more stable gradient estimates than single-sample updates, improving convergence behaviour and final model accuracy whilst maintaining computational feasibility for large datasets.
Common Applications
Mini-batch training is standard in deep learning frameworks across computer vision (image classification), natural language processing (transformer model training), and recommender systems. It is universally employed in production machine learning pipelines for neural networks, whether in research institutions or enterprise deployments.
Key Considerations
The choice of batch size introduces a hyperparameter tuning requirement; larger batches reduce noise but may converge to sharper minima, whilst smaller batches provide regularisation effects but increase training iterations. Memory constraints and hardware availability often dictate practical batch size limits.
Cross-References(2)
More in Machine Learning
Model Serving
MLOps & ProductionThe infrastructure and processes for deploying trained machine learning models to production environments for real-time predictions.
Semi-Supervised Learning
Advanced MethodsA learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.
Multi-Task Learning
MLOps & ProductionA machine learning approach where a model is simultaneously trained on multiple related tasks to improve generalisation.
Deep Reinforcement Learning
Reinforcement LearningCombining deep neural networks with reinforcement learning to enable agents to learn complex decision-making from raw sensory input.
Decision Tree
Supervised LearningA tree-structured model where internal nodes represent feature tests, branches represent outcomes, and leaves represent predictions.
Feature Engineering
Feature Engineering & SelectionThe process of using domain knowledge to create, select, and transform input variables to improve model performance.
Naive Bayes
Supervised LearningA probabilistic classifier based on applying Bayes' theorem with the assumption of independence between features.
Bagging
Advanced MethodsBootstrap Aggregating — an ensemble method that trains multiple models on random subsets of data and averages their predictions.