Overview
Direct Answer
A Recurrent Neural Network (RNN) is a deep learning architecture with feedback loops that enable hidden states to persist across sequential inputs, allowing the model to retain and leverage information from previous timesteps. This internal memory mechanism makes RNNs particularly suited to tasks where temporal dependencies and context are critical.
How It Works
RNNs process input sequences one element at a time, passing a hidden state forward alongside each computation. At each timestep, the network combines the current input with the previous hidden state through matrix multiplication and activation functions, creating a chain of memory. This feedback structure allows gradients to propagate backward through time, though vanishing or exploding gradients can complicate training on long sequences.
Why It Matters
Organisations rely on RNNs for sequence modelling where temporal patterns directly impact accuracy and business outcomes. Applications requiring context-awareness—such as language understanding, time-series forecasting, and speech recognition—benefit from the architecture's inherent ability to model dependencies without explicit feature engineering, reducing development cycle time and improving predictive performance.
Common Applications
RNNs power natural language processing tasks including machine translation, sentiment analysis, and named entity recognition. Time-series forecasting in finance and operations, speech-to-text systems, and video frame prediction represent key industrial applications where sequential patterns must be learned and extrapolated.
Key Considerations
Training RNNs on very long sequences faces the vanishing gradient problem, limiting effective memory depth; variants such as LSTMs and GRUs address this through gating mechanisms. Computational cost scales with sequence length, and RNNs are often slower to train than transformer-based alternatives for certain tasks.
Cross-References(1)
Referenced By1 term mentions Recurrent Neural Network
Other entries in the wiki whose definition references Recurrent Neural Network — useful for understanding how this concept connects across Deep Learning and adjacent domains.
More in Deep Learning
Dropout
Training & OptimisationA regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.
Pipeline Parallelism
ArchitecturesA form of model parallelism that splits neural network layers across devices and pipelines micro-batches through stages, maximising hardware utilisation during training.
Data Parallelism
ArchitecturesA distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.
Pre-Training
Language ModelsThe initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.
LoRA
Language ModelsLow-Rank Adaptation — a parameter-efficient fine-tuning technique that adds trainable low-rank matrices to frozen pretrained weights.
Attention Head
Training & OptimisationAn individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.
Self-Attention
Training & OptimisationAn attention mechanism where each element in a sequence attends to all other elements to compute its representation.
Prefix Tuning
Language ModelsA parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.