Overview
Direct Answer
An encoder-decoder architecture is a neural network framework in which an encoder network compresses variable-length input into a fixed-size context vector, and a decoder network reconstructs or generates output from that representation. This design enables processing of sequential data with different input and output lengths.
How It Works
The encoder processes input tokens sequentially through recurrent or transformer layers, extracting semantic meaning into a dense vector or sequence of hidden states. The decoder then uses this context representation as its initial state, generating output tokens one at a time through conditional probability distributions. Attention mechanisms often bridge encoder and decoder, allowing the decoder to focus selectively on relevant input regions during generation.
Why It Matters
This architecture fundamentally enables sequence-to-sequence tasks where input and output have mismatched structures, improving accuracy on translation, summarisation, and dialogue systems. Organisations benefit from unified handling of variable-length problems without task-specific feature engineering, reducing development time and operational complexity.
Common Applications
Applications include machine translation (translating between languages), automatic speech recognition (audio to text), image captioning (visual input to textual description), and abstractive summarisation. Medical transcription, customer support automation, and code generation systems rely on this approach.
Key Considerations
The fixed-size bottleneck in traditional designs can lose information from long sequences, mitigated by attention mechanisms and hierarchical encoders. Computational cost scales with sequence length; inference speed may constrain real-time applications.
Cross-References(1)
More in Deep Learning
Convolutional Layer
ArchitecturesA neural network layer that applies learnable filters across input data to detect local patterns and features.
Pretraining
ArchitecturesTraining a model on a large general dataset before fine-tuning it on a specific downstream task.
Parameter-Efficient Fine-Tuning
Language ModelsMethods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.
Dropout
Training & OptimisationA regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.
Mamba Architecture
ArchitecturesA selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.
Contrastive Learning
ArchitecturesA self-supervised learning approach that trains models by comparing similar and dissimilar pairs of data representations.
Flash Attention
ArchitecturesAn IO-aware attention algorithm that reduces memory reads and writes by tiling the attention computation, enabling faster training of long-context transformer models.
Fine-Tuning
ArchitecturesThe process of taking a pretrained model and further training it on a smaller, task-specific dataset.