LoRA — Technology Wiki

Overview

Direct Answer

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that trains small, low-rank matrices inserted alongside frozen pretrained model weights, rather than updating all parameters. This approach dramatically reduces the number of trainable parameters whilst maintaining model performance.

How It Works

LoRA decomposes weight updates into products of two smaller matrices with reduced rank dimensions. During fine-tuning, only these low-rank decomposition matrices are optimised whilst the original pretrained weights remain fixed. The adapted weights are computed as the sum of the frozen weights and the product of the trained low-rank matrices, scaled by a learning rate factor.

Why It Matters

This technique reduces memory consumption and computational cost by 10-100 times compared to full fine-tuning, making large model adaptation feasible on modest hardware. Organisations can now customise foundation models for specific tasks without prohibitive infrastructure investment or extended training timelines.

Common Applications

LoRA is widely deployed in adapting large language models for domain-specific tasks, personalising image generation models for particular artistic styles, and enabling efficient task-specific variants of models like LLaMA and Stable Diffusion. Financial institutions and healthcare organisations use it to specialise models whilst maintaining compliance boundaries.

Key Considerations

The technique introduces a rank hyperparameter that requires tuning; ranks that are too low may limit adaptation capability, whilst higher ranks diminish parameter efficiency gains. LoRA performs best when fine-tuning data resembles pretraining distribution; extreme distribution shifts may still require partial weight unfreezing.

Cross-References(2)

Deep Learning

Parameter-Efficient Fine-Tuning Fine-Tuning

Related in Language Models

Word Embedding

Dense vector representations of words where semantically similar words are mapped to nearby points in vector space.

Parameter-Efficient Fine-Tuning

Methods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.

Adapter Layers

Small trainable modules inserted between frozen transformer layers that enable task-specific adaptation without modifying the original model weights.

Prefix Tuning

A parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.

Pre-Training

The initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.

Fine-Tuning

The process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.

More in Deep Learning

Deep Learning

Architectures

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Flash Attention

Architectures

An IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.

Convolutional Layer

Architectures

A neural network layer that applies learnable filters across input data to detect local patterns and features.

Mixed Precision Training

Training & Optimisation

Training neural networks using both 16-bit and 32-bit floating-point arithmetic to speed up computation while maintaining accuracy.

Representation Learning

Architectures

The automatic discovery of data representations needed for feature detection or classification from raw data.

Neural Network

Architectures

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Skip Connection

Architectures

A neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.

Embedding

Architectures

A learned dense vector representation of discrete data (like words or categories) in a continuous vector space.