Overview
A neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.
Cross-References(1)
More in Deep Learning
Sigmoid Function
Training & OptimisationAn activation function that maps input values to a range between 0 and 1, useful for binary classification outputs.
Layer Normalisation
Training & OptimisationA normalisation technique that normalises across the features of each individual sample rather than across the batch.
Representation Learning
ArchitecturesThe automatic discovery of data representations needed for feature detection or classification from raw data.
Mamba Architecture
ArchitecturesA selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.
Multi-Head Attention
Training & OptimisationAn attention mechanism that runs multiple attention operations in parallel, capturing different types of relationships.
Data Parallelism
ArchitecturesA distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.
Activation Function
Training & OptimisationA mathematical function applied to neural network outputs to introduce non-linearity, enabling the learning of complex patterns.
Skip Connection
ArchitecturesA neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.