Overview
A normalisation technique that normalises across the features of each individual sample rather than across the batch.
More in Deep Learning
Knowledge Distillation
ArchitecturesA model compression technique where a smaller student model learns to mimic the behaviour of a larger teacher model.
Vision Transformer
ArchitecturesA transformer architecture adapted for image recognition that divides images into patches and processes them as sequences, rivalling convolutional networks in visual tasks.
Deep Learning
ArchitecturesA subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.
LoRA
Language ModelsLow-Rank Adaptation — a parameter-efficient fine-tuning technique that adds trainable low-rank matrices to frozen pretrained weights.
Pooling Layer
ArchitecturesA neural network layer that reduces spatial dimensions by aggregating values, commonly using max or average operations.
Neural Network
ArchitecturesA computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.
Autoencoder
ArchitecturesA neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.
Pipeline Parallelism
ArchitecturesA form of model parallelism that splits neural network layers across devices and pipelines micro-batches through stages, maximising hardware utilisation during training.