Overview
A regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.
Cross-References(2)
More in Deep Learning
State Space Model
ArchitecturesA sequence modelling architecture based on continuous-time dynamical systems that processes long sequences with linear complexity, offering an alternative to attention-based transformers.
Representation Learning
ArchitecturesThe automatic discovery of data representations needed for feature detection or classification from raw data.
Recurrent Neural Network
ArchitecturesA neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.
Encoder-Decoder Architecture
ArchitecturesA neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.
Vanishing Gradient
ArchitecturesA problem in deep networks where gradients become extremely small during backpropagation, preventing earlier layers from learning.
Tensor Parallelism
ArchitecturesA distributed computing strategy that splits individual layer computations across multiple devices by partitioning weight matrices along specific dimensions.
Mamba Architecture
ArchitecturesA selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.
Pretraining
ArchitecturesTraining a model on a large general dataset before fine-tuning it on a specific downstream task.