Overview
The process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.
More in Deep Learning
Batch Normalisation
ArchitecturesA technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.
Data Parallelism
ArchitecturesA distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.
Recurrent Neural Network
ArchitecturesA neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.
Contrastive Learning
ArchitecturesA self-supervised learning approach that trains models by comparing similar and dissimilar pairs of data representations.
Exploding Gradient
ArchitecturesA problem where gradients grow exponentially during backpropagation, causing unstable weight updates and training failure.
Attention Head
Training & OptimisationAn individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.
Weight Initialisation
ArchitecturesThe strategy for setting initial parameter values in a neural network before training begins.
Convolutional Neural Network
ArchitecturesA deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.