Deep Learning

Overview

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Cross-References(1)

Machine Learning

Related in Architectures

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Convolutional Neural Network

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Recurrent Neural Network

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Long Short-Term Memory

A recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.

Gated Recurrent Unit

A simplified variant of LSTM that combines the forget and input gates into a single update gate.

Transformer

A neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.

Attention Mechanism

A neural network component that learns to focus on relevant parts of the input when producing each element of the output.

Encoder-Decoder Architecture

A neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.

Autoencoder

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.

Variational Autoencoder

A generative model that learns a probabilistic latent space representation, enabling generation of new data samples.

Batch Normalisation

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

Embedding

A learned dense vector representation of discrete data (like words or categories) in a continuous vector space.

More in Deep Learning

Attention Head

Training & Optimisation

An individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.

Residual Network

Training & Optimisation

A deep neural network architecture using skip connections that allow gradients to flow directly through layers, enabling very deep networks.

Multi-Head Attention

Training & Optimisation

An attention mechanism that runs multiple attention operations in parallel, capturing different types of relationships.

Gradient Checkpointing

Architectures

A memory optimisation that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them all during the forward pass.

Sigmoid Function

Training & Optimisation

An activation function that maps input values to a range between 0 and 1, useful for binary classification outputs.

Rotary Positional Encoding

Training & Optimisation

A position encoding method that encodes absolute position with a rotation matrix and naturally incorporates relative position information into attention computations.

Weight Decay

Architectures

A regularisation technique that penalises large model weights during training by adding a fraction of the weight magnitude to the loss function, preventing overfitting.

Exploding Gradient

Architectures

A problem where gradients grow exponentially during backpropagation, causing unstable weight updates and training failure.

Overview

Cross-References(1)

Related in Architectures

Neural Network

Convolutional Neural Network

Recurrent Neural Network

Long Short-Term Memory

Gated Recurrent Unit

Transformer

Attention Mechanism

Encoder-Decoder Architecture

Autoencoder

Variational Autoencoder

Batch Normalisation

Embedding

More in Deep Learning

Attention Head

Residual Network

Multi-Head Attention

Gradient Checkpointing

Sigmoid Function

Rotary Positional Encoding

Weight Decay

Exploding Gradient

See Also

Machine Learning