Overview
A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.
Cross-References(1)
More in Deep Learning
Multi-Head Attention
Training & OptimisationAn attention mechanism that runs multiple attention operations in parallel, capturing different types of relationships.
Word Embedding
Language ModelsDense vector representations of words where semantically similar words are mapped to nearby points in vector space.
Mamba Architecture
ArchitecturesA selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.
Generative Adversarial Network
Generative ModelsA framework where two neural networks compete — a generator creates synthetic data while a discriminator evaluates its authenticity.
Parameter-Efficient Fine-Tuning
Language ModelsMethods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.
Vanishing Gradient
ArchitecturesA problem in deep networks where gradients become extremely small during backpropagation, preventing earlier layers from learning.
Softmax Function
Training & OptimisationAn activation function that converts a vector of numbers into a probability distribution, commonly used in multi-class classification.
Skip Connection
ArchitecturesA neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.