Overview
Methods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.
More in Deep Learning
Attention Mechanism
ArchitecturesA neural network component that learns to focus on relevant parts of the input when producing each element of the output.
Representation Learning
ArchitecturesThe automatic discovery of data representations needed for feature detection or classification from raw data.
Vanishing Gradient
ArchitecturesA problem in deep networks where gradients become extremely small during backpropagation, preventing earlier layers from learning.
Residual Connection
Training & OptimisationA skip connection that adds a layer's input directly to its output, enabling gradient flow through deep networks and allowing training of architectures with hundreds of layers.
Fine-Tuning
ArchitecturesThe process of taking a pretrained model and further training it on a smaller, task-specific dataset.
Capsule Network
ArchitecturesA neural network architecture that groups neurons into capsules to better capture spatial hierarchies and part-whole relationships.
Model Parallelism
ArchitecturesA distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.
Vision Transformer
ArchitecturesA transformer architecture adapted for image recognition that divides images into patches and processes them as sequences, rivalling convolutional networks in visual tasks.