Overview
Direct Answer
A deep neural network architecture that employs skip connections (residual connections) to allow input signals to bypass one or more layers, enabling the training of networks with 100+ layers by mitigating the vanishing gradient problem.
How It Works
Skip connections add the input of a layer directly to its output, forcing the network to learn residual mappings—the difference between desired and input signals—rather than learning the full transformation. This architectural modification preserves gradient magnitude during backpropagation, allowing errors to flow through very deep networks without exponential decay.
Why It Matters
Residual networks dramatically improved accuracy in large-scale image recognition tasks and became foundational for modern computer vision systems. The ability to train substantially deeper models with better convergence properties reduced training time and improved performance on complex visual and sequential tasks, driving adoption across industries requiring high-accuracy perception systems.
Common Applications
Medical image analysis for diagnostic detection, object recognition in autonomous vehicle systems, and large-scale image classification in e-commerce platforms rely on residual architectures. Natural language processing models and speech recognition systems also employ residual connections to process sequential data more effectively.
Key Considerations
Deeper networks do not automatically produce better results; residual connections mitigate training difficulties but require careful hyperparameter tuning and computational resources. Practitioners must balance network depth against overfitting risk and deployment constraints.
Cross-References(1)
More in Deep Learning
Word Embedding
Language ModelsDense vector representations of words where semantically similar words are mapped to nearby points in vector space.
Fine-Tuning
ArchitecturesThe process of taking a pretrained model and further training it on a smaller, task-specific dataset.
Generative Adversarial Network
Generative ModelsA framework where two neural networks compete — a generator creates synthetic data while a discriminator evaluates its authenticity.
Weight Decay
ArchitecturesA regularisation technique that penalises large model weights during training by adding a fraction of the weight magnitude to the loss function, preventing overfitting.
Batch Normalisation
ArchitecturesA technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.
Vanishing Gradient
ArchitecturesA problem in deep networks where gradients become extremely small during backpropagation, preventing earlier layers from learning.
Knowledge Distillation
ArchitecturesA model compression technique where a smaller student model learns to mimic the behaviour of a larger teacher model.
Capsule Network
ArchitecturesA neural network architecture that groups neurons into capsules to better capture spatial hierarchies and part-whole relationships.