Transformer — Technology Wiki

Overview

A neural network architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel processing of sequences.

Cross-References(1)

Deep Learning

Neural Network

Related in Architectures

Deep Learning

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected nodes that process information in layers.

Convolutional Neural Network

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Recurrent Neural Network

A neural network architecture where connections between nodes form directed cycles, enabling processing of sequential data.

Long Short-Term Memory

A recurrent neural network architecture designed to learn long-term dependencies by using gating mechanisms to control information flow.

Gated Recurrent Unit

A simplified variant of LSTM that combines the forget and input gates into a single update gate.

Attention Mechanism

A neural network component that learns to focus on relevant parts of the input when producing each element of the output.

Encoder-Decoder Architecture

A neural network design where an encoder processes input into a fixed representation and a decoder generates output from it.

Autoencoder

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.

Variational Autoencoder

A generative model that learns a probabilistic latent space representation, enabling generation of new data samples.

Batch Normalisation

A technique that normalises layer inputs during training to stabilise and accelerate deep neural network learning.

Embedding

A learned dense vector representation of discrete data (like words or categories) in a continuous vector space.

More in Deep Learning

Weight Initialisation

Architectures

The strategy for setting initial parameter values in a neural network before training begins.

Model Parallelism

Architectures

A distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.

Pretraining

Architectures

Training a model on a large general dataset before fine-tuning it on a specific downstream task.

Contrastive Learning

Architectures

A self-supervised learning approach that trains models by comparing similar and dissimilar pairs of data representations.

Vision Transformer

Architectures

A transformer architecture adapted for image recognition that divides images into patches and processes them as sequences, rivalling convolutional networks in visual tasks.

Generative Adversarial Network

Generative Models

A framework where two neural networks compete — a generator creates synthetic data while a discriminator evaluates its authenticity.

Pipeline Parallelism

Architectures

A form of model parallelism that splits neural network layers across devices and pipelines micro-batches through stages, maximising hardware utilisation during training.

Dropout

Training & Optimisation

A regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.