Overview
Direct Answer
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that trains small, low-rank matrices inserted alongside frozen pretrained model weights, rather than updating all parameters. This approach dramatically reduces the number of trainable parameters whilst maintaining model performance.
How It Works
LoRA decomposes weight updates into products of two smaller matrices with reduced rank dimensions. During fine-tuning, only these low-rank decomposition matrices are optimised whilst the original pretrained weights remain fixed. The adapted weights are computed as the sum of the frozen weights and the product of the trained low-rank matrices, scaled by a learning rate factor.
Why It Matters
This technique reduces memory consumption and computational cost by 10-100 times compared to full fine-tuning, making large model adaptation feasible on modest hardware. Organisations can now customise foundation models for specific tasks without prohibitive infrastructure investment or extended training timelines.
Common Applications
LoRA is widely deployed in adapting large language models for domain-specific tasks, personalising image generation models for particular artistic styles, and enabling efficient task-specific variants of models like LLaMA and Stable Diffusion. Financial institutions and healthcare organisations use it to specialise models whilst maintaining compliance boundaries.
Key Considerations
The technique introduces a rank hyperparameter that requires tuning; ranks that are too low may limit adaptation capability, whilst higher ranks diminish parameter efficiency gains. LoRA performs best when fine-tuning data resembles pretraining distribution; extreme distribution shifts may still require partial weight unfreezing.
Cross-References(2)
More in Deep Learning
Deep Learning
ArchitecturesA subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.
Flash Attention
ArchitecturesAn IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.
Convolutional Layer
ArchitecturesA neural network layer that applies learnable filters across input data to detect local patterns and features.
Mixed Precision Training
Training & OptimisationTraining neural networks using both 16-bit and 32-bit floating-point arithmetic to speed up computation while maintaining accuracy.
Representation Learning
ArchitecturesThe automatic discovery of data representations needed for feature detection or classification from raw data.
Neural Network
ArchitecturesA computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.
Skip Connection
ArchitecturesA neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.
Embedding
ArchitecturesA learned dense vector representation of discrete data (like words or categories) in a continuous vector space.