Attention Mechanism — Technology Wiki

Overview

A neural network component that learns to focus on relevant parts of the input when producing each element of the output.

Cross-References(1)

Deep Learning

Neural Network

Related in Architectures

Deep Learning

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Convolutional Neural Network

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Recurrent Neural Network

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Long Short-Term Memory

A recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.

Gated Recurrent Unit

A simplified variant of LSTM that combines the forget and input gates into a single update gate.

Transformer

A neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.

Encoder-Decoder Architecture

A neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.

Autoencoder

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.

Variational Autoencoder

A generative model that learns a probabilistic latent space representation, enabling generation of new data samples.

Batch Normalisation

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

Embedding

A learned dense vector representation of discrete data (like words or categories) in a continuous vector space.

More in Deep Learning

Parameter-Efficient Fine-Tuning

Language Models

Methods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.

Diffusion Model

Generative Models

A generative model that learns to reverse a gradual noising process, generating high-quality samples from random noise.

Convolutional Layer

Architectures

A neural network layer that applies learnable filters across input data to detect local patterns and features.

Adapter Layers

Language Models

Small trainable modules inserted between frozen transformer layers that enable task-specific adaptation without modifying the original model weights.

Skip Connection

Architectures

A neural network shortcut that allows the output of one layer to bypass intermediate layers and be added to a later layer's output.

Tensor Parallelism

Architectures

A distributed computing strategy that splits individual layer computations across multiple devices by partitioning weight matrices along specific dimensions.

Flash Attention

Architectures

An IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.

Generative Adversarial Network

Generative Models

A framework where two neural networks compete — a generator creates synthetic data while a discriminator evaluates its authenticity.