Overview
Training a model on a large general dataset before fine-tuning it on a specific downstream task.
Cross-References(1)
More in Deep Learning
Prefix Tuning
Language ModelsA parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.
Pre-Training
Language ModelsThe initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.
State Space Model
ArchitecturesA sequence modelling architecture based on continuous-time dynamical systems that processes long sequences with linear complexity, offering an alternative to attention-based transformers.
Gradient Checkpointing
ArchitecturesA memory optimisation that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them all during the forward pass.
Fine-Tuning
Language ModelsThe process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.
Softmax Function
Training & OptimisationAn activation function that converts a vector of numbers into a probability distribution, commonly used in multi-class classification.
Word Embedding
Language ModelsDense vector representations of words where semantically similar words are mapped to nearby points in vector space.
Gradient Clipping
Training & OptimisationA technique that caps gradient values during training to prevent the exploding gradient problem.