Overview
A deep neural network architecture using skip connections that allow gradients to flow directly through layers, enabling very deep networks.
Cross-References(1)
More in Deep Learning
Encoder-Decoder Architecture
ArchitecturesA neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.
Model Parallelism
ArchitecturesA distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.
Transformer
ArchitecturesA neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.
Pooling Layer
ArchitecturesA neural network layer that reduces spatial dimensions by aggregating values, commonly using max or average operations.
Mixture of Experts
ArchitecturesAn architecture where different specialised sub-networks (experts) are selectively activated based on the input.
Exploding Gradient
ArchitecturesA problem where gradients grow exponentially during backpropagation, causing unstable weight updates and training failure.
Residual Connection
Training & OptimisationA skip connection that adds a layer's input directly to its output, enabling gradient flow through deep networks and allowing training of architectures with hundreds of layers.
Fully Connected Layer
ArchitecturesA neural network layer where every neuron is connected to every neuron in the adjacent layers.