Overview
Direct Answer
A Variational Autoencoder (VAE) is a generative deep learning model that encodes input data into a probabilistic latent space and reconstructs it through a decoder, enabling both data compression and synthesis of novel samples. Unlike standard autoencoders, VAEs impose a prior distribution on the latent representation, making the learned space suitable for generative tasks.
How It Works
The encoder network maps input data to parameters of a probability distribution (typically Gaussian) in latent space rather than fixed point values. The decoder samples from this distribution and reconstructs the input, whilst the model optimises a loss function combining reconstruction error and a Kullback–Leibler divergence term that regularises the latent distribution towards the prior. This dual objective ensures the latent space remains continuous and well-structured for interpolation and sampling.
Why It Matters
VAEs provide a principled framework for learning interpretable, continuous latent representations whilst maintaining tractable inference and generation. This capability reduces data annotation burden, enables anomaly detection through reconstruction likelihood, and supports downstream machine learning tasks through dimensionality reduction without sacrificing generative capability.
Common Applications
Applications include image generation and manipulation in computer vision, anomaly detection in manufacturing and healthcare diagnostics, and feature learning for semi-supervised classification. VAEs are also employed in drug discovery for molecular generation and in recommendation systems for learning latent user preferences.
Key Considerations
VAEs typically produce blurrier reconstructions than deterministic autoencoders due to the stochastic sampling process. The model's performance depends heavily on appropriate weighting between reconstruction and regularisation terms, and the choice of prior distribution significantly influences the learned latent structure and generative quality.
More in Deep Learning
Pre-Training
Language ModelsThe initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.
Convolutional Layer
ArchitecturesA neural network layer that applies learnable filters across input data to detect local patterns and features.
Generative Adversarial Network
Generative ModelsA framework where two neural networks compete — a generator creates synthetic data while a discriminator evaluates its authenticity.
Softmax Function
Training & OptimisationAn activation function that converts a vector of numbers into a probability distribution, commonly used in multi-class classification.
Flash Attention
ArchitecturesAn IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.
Graph Neural Network
ArchitecturesA neural network designed to operate on graph-structured data, learning representations of nodes, edges, and entire graphs.
Model Parallelism
ArchitecturesA distributed training approach that partitions a model across multiple devices, enabling training of models too large to fit in a single accelerator's memory.
Fully Connected Layer
ArchitecturesA neural network layer where every neuron is connected to every neuron in the adjacent layers.