Overview
A generative model that learns to reverse a gradual noising process, generating high-quality samples from random noise.
More in Deep Learning
Transformer
ArchitecturesA neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.
Adapter Layers
Language ModelsSmall trainable modules inserted between frozen transformer layers that enable task-specific adaptation without modifying the original model weights.
Weight Decay
ArchitecturesA regularisation technique that penalises large model weights during training by adding a fraction of the weight magnitude to the loss function, preventing overfitting.
Deep Learning
ArchitecturesA subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.
State Space Model
ArchitecturesA sequence modelling architecture based on continuous-time dynamical systems that processes long sequences with linear complexity, offering an alternative to attention-based transformers.
Autoencoder
ArchitecturesA neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.
Pipeline Parallelism
ArchitecturesA form of model parallelism that splits neural network layers across devices and pipelines micro-batches through stages, maximising hardware utilisation during training.
Embedding
ArchitecturesA learned dense vector representation of discrete data (like words or categories) in a continuous vector space.