Overview
A technique that caps gradient values during training to prevent the exploding gradient problem.
Cross-References(1)
More in Deep Learning
Contrastive Learning
ArchitecturesA self-supervised learning approach that trains models by comparing similar and dissimilar pairs of data representations.
Gated Recurrent Unit
ArchitecturesA simplified variant of LSTM that combines the forget and input gates into a single update gate.
Tensor Parallelism
ArchitecturesA distributed computing strategy that splits individual layer computations across multiple devices by partitioning weight matrices along specific dimensions.
Word Embedding
Language ModelsDense vector representations of words where semantically similar words are mapped to nearby points in vector space.
Long Short-Term Memory
ArchitecturesA recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.
Skip Connection
ArchitecturesA neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.
Pre-Training
Language ModelsThe initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.
Prefix Tuning
Language ModelsA parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.