Pipeline Parallelism

Overview

A form of model parallelism that splits neural network layers across devices and pipelines micro-batches through stages, maximising hardware utilisation during training.

Cross-References(3)

Deep Learning

Model Parallelism Neural Network

Software Engineering

Parallelism

Related in Architectures

Deep Learning

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Convolutional Neural Network

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Recurrent Neural Network

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Long Short-Term Memory

A recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.

Gated Recurrent Unit

A simplified variant of LSTM that combines the forget and input gates into a single update gate.

Transformer

A neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.

Attention Mechanism

A neural network component that learns to focus on relevant parts of the input when producing each element of the output.

Encoder-Decoder Architecture

A neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.

Autoencoder

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.

Variational Autoencoder

A generative model that learns a probabilistic latent space representation, enabling generation of new data samples.

Batch Normalisation

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

More in Deep Learning

Flash Attention

Architectures

An IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.

Sigmoid Function

Training & Optimisation

An activation function that maps input values to a range between 0 and 1, useful for binary classification outputs.

Diffusion Model

Generative Models

A generative model that learns to reverse a gradual noising process, generating high-quality samples from random noise.

Key-Value Cache

Architectures

An optimisation in autoregressive transformer inference that stores previously computed key and value tensors to avoid redundant computation during sequential token generation.

Representation Learning

Architectures

The automatic discovery of data representations needed for feature detection or classification from raw data.

Model Parallelism

Architectures

A distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.

Pooling Layer

Architectures

A neural network layer that reduces spatial dimensions by aggregating values, commonly using max or average operations.

Pretraining

Architectures

Training a model on a large general dataset before fine-tuning it on a specific downstream task.

Overview

Cross-References(3)

Related in Architectures

Deep Learning

Neural Network

Convolutional Neural Network

Recurrent Neural Network

Long Short-Term Memory

Gated Recurrent Unit

Transformer

Attention Mechanism

Encoder-Decoder Architecture

Autoencoder

Variational Autoencoder

Batch Normalisation

More in Deep Learning

Flash Attention

Sigmoid Function

Diffusion Model

Key-Value Cache

Representation Learning

Model Parallelism

Pooling Layer

Pretraining

See Also

Parallelism