Deep LearningArchitectures

Recurrent Neural Network

Overview

Direct Answer

A Recurrent Neural Network (RNN) is a deep learning architecture with feedback loops that enable hidden states to persist across sequential inputs, allowing the model to retain and leverage information from previous timesteps. This internal memory mechanism makes RNNs particularly suited to tasks where temporal dependencies and context are critical.

How It Works

RNNs process input sequences one element at a time, passing a hidden state forward alongside each computation. At each timestep, the network combines the current input with the previous hidden state through matrix multiplication and activation functions, creating a chain of memory. This feedback structure allows gradients to propagate backward through time, though vanishing or exploding gradients can complicate training on long sequences.

Why It Matters

Organisations rely on RNNs for sequence modelling where temporal patterns directly impact accuracy and business outcomes. Applications requiring context-awareness—such as language understanding, time-series forecasting, and speech recognition—benefit from the architecture's inherent ability to model dependencies without explicit feature engineering, reducing development cycle time and improving predictive performance.

Common Applications

RNNs power natural language processing tasks including machine translation, sentiment analysis, and named entity recognition. Time-series forecasting in finance and operations, speech-to-text systems, and video frame prediction represent key industrial applications where sequential patterns must be learned and extrapolated.

Key Considerations

Training RNNs on very long sequences faces the vanishing gradient problem, limiting effective memory depth; variants such as LSTMs and GRUs address this through gating mechanisms. Computational cost scales with sequence length, and RNNs are often slower to train than transformer-based alternatives for certain tasks.

Cross-References(1)

Deep Learning

Referenced By1 term mentions Recurrent Neural Network

Other entries in the wiki whose definition references Recurrent Neural Network — useful for understanding how this concept connects across Deep Learning and adjacent domains.

More in Deep Learning

Dropout

Training & Optimisation

A regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.

Pipeline Parallelism

Architectures

A form of model parallelism that splits neural network layers across devices and pipelines micro-batches through stages, maximising hardware utilisation during training.

Data Parallelism

Architectures

A distributed training strategy that replicates the model across multiple devices and divides training data into batches processed simultaneously, synchronising gradients after each step.

Pre-Training

Language Models

The initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.

LoRA

Language Models

Low-Rank Adaptation — a parameter-efficient fine-tuning technique that adds trainable low-rank matrices to frozen pretrained weights.

Attention Head

Training & Optimisation

An individual attention computation within a multi-head attention layer that learns to focus on different aspects of the input, with outputs concatenated for richer representations.

Self-Attention

Training & Optimisation

An attention mechanism where each element in a sequence attends to all other elements to compute its representation.

Prefix Tuning

Language Models

A parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.