Fine-Tuning — Technology Wiki

Overview

The process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.

Related in Language Models

Word Embedding

Dense vector representations of words where semantically similar words are mapped to nearby points in vector space.

LoRA

Low-Rank Adaptation — a parameter-efficient fine-tuning technique that adds trainable low-rank matrices to frozen pretrained weights.

Parameter-Efficient Fine-Tuning

Methods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.

Adapter Layers

Small trainable modules inserted between frozen transformer layers that enable task-specific adaptation without modifying the original model weights.

Prefix Tuning

A parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.

Pre-Training

The initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.

More in Deep Learning

Batch Normalisation

Architectures

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

Data Parallelism

Architectures

A distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.

Recurrent Neural Network

Architectures

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Contrastive Learning

Architectures

A self-supervised learning approach that trains models by comparing similar and dissimilar pairs of data representations.

Exploding Gradient

Architectures

A problem where gradients grow exponentially during backpropagation, causing unstable weight updates and training failure.

Attention Head

Training & Optimisation

An individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.

Weight Initialisation

Architectures

The strategy for setting initial parameter values in a neural network before training begins.

Convolutional Neural Network

Architectures

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.