Overview
A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.
Cross-References(1)
More in Deep Learning
Fine-Tuning
Language ModelsThe process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.
Residual Network
Training & OptimisationA deep neural network architecture using skip connections that allow gradients to flow directly through layers, enabling very deep networks.
ReLU
Training & OptimisationRectified Linear Unit — an activation function that outputs the input directly if positive, otherwise outputs zero.
Mamba Architecture
ArchitecturesA selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.
Fully Connected Layer
ArchitecturesA neural network layer where every neuron is connected to every neuron in the adjacent layers.
Weight Initialisation
ArchitecturesThe strategy for setting initial parameter values in a neural network before training begins.
Model Parallelism
ArchitecturesA distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.
Gradient Checkpointing
ArchitecturesA memory optimisation that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them all during the forward pass.