Overview
A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.
Cross-References(1)
More in Deep Learning
Convolutional Layer
ArchitecturesA neural network layer that applies learnable filters across input data to detect local patterns and features.
Knowledge Distillation
ArchitecturesA model compression technique where a smaller student model learns to mimic the behaviour of a larger teacher model.
Mixture of Experts
ArchitecturesAn architecture where different specialised sub-networks (experts) are selectively activated based on the input.
Capsule Network
ArchitecturesA neural network architecture that groups neurons into capsules to better capture spatial hierarchies and part-whole relationships.
Pre-Training
Language ModelsThe initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.
Dropout
Training & OptimisationA regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.
Data Parallelism
ArchitecturesA distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.
Layer Normalisation
Training & OptimisationA normalisation technique that normalises across the features of each individual sample rather than across the batch.