Overview
A skip connection that adds a layer's input directly to its output, enabling gradient flow through deep networks and allowing training of architectures with hundreds of layers.
Cross-References(1)
More in Deep Learning
Encoder-Decoder Architecture
ArchitecturesA neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.
Transformer
ArchitecturesA neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.
Weight Decay
ArchitecturesA regularisation technique that penalises large model weights during training by adding a fraction of the weight magnitude to the loss function, preventing overfitting.
Exploding Gradient
ArchitecturesA problem where gradients grow exponentially during backpropagation, causing unstable weight updates and training failure.
Adapter Layers
Language ModelsSmall trainable modules inserted between frozen transformer layers that enable task-specific adaptation without modifying the original model weights.
Deep Learning
ArchitecturesA subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.
Variational Autoencoder
ArchitecturesA generative model that learns a probabilistic latent space representation, enabling generation of new data samples.
Long Short-Term Memory
ArchitecturesA recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.