Catastrophic Forgetting — Technology Wiki

Overview

Direct Answer

Catastrophic forgetting occurs when a neural network trained sequentially on new tasks overwrites the weights and representations learned during training on previous tasks, resulting in severe performance degradation on earlier data. This phenomenon represents a critical barrier to continual learning systems that must adapt to evolving data distributions without access to historical training examples.

How It Works

During backpropagation on new task data, gradient updates systematically modify weights that were previously optimised for earlier tasks. Since neural networks lack inherent mechanisms to distinguish task-critical parameters from task-agnostic ones, new learning signals propagate through shared layers indiscriminately, erasing task-specific knowledge encoded in weight configurations. The magnitude and direction of weight changes required for new tasks often conflict directly with those that preserved old task performance.

Why It Matters

Enterprise systems deployed in dynamic environments—such as recommendation engines, fraud detection, and robotic process automation—must learn continuously without retraining from scratch, which is computationally expensive and operationally infeasible. Uncontrolled forgetting undermines model reliability, reduces prediction accuracy on legacy use cases, and necessitates expensive mitigation strategies like replay buffers or regularisation-based approaches.

Common Applications

Robotic systems adapting to new environments whilst maintaining prior manipulation skills, recommendation platforms encountering new user cohorts whilst preserving personalisation for existing users, and autonomous vehicle perception systems learning new weather conditions or road types without degrading performance on previously encountered scenarios.

Key Considerations

Solutions such as elastic weight consolidation, experience replay, and progressive neural networks introduce computational overhead or memory requirements that may not scale to large models. The optimal strategy depends on whether task boundaries are known in advance and whether access to previous data is permissible.

Cross-References(1)

Machine Learning

Multi-Task Learning

Related in Anomaly & Pattern Detection

Anomaly Detection

Identifying data points, events, or observations that deviate significantly from the expected pattern in a dataset.

More in Machine Learning

Semi-Supervised Learning

Advanced Methods

A learning approach that combines a small amount of labelled data with a large amount of unlabelled data during training.

Principal Component Analysis

Unsupervised Learning

A dimensionality reduction technique that transforms data into orthogonal components ordered by the amount of variance they explain.

Machine Learning

MLOps & Production

A subset of AI that enables systems to automatically learn and improve from experience without being explicitly programmed.

Unsupervised Learning

MLOps & Production

A machine learning approach where models discover patterns and structures in data without labelled examples.

Underfitting

Training Techniques

When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data.

Online Learning

MLOps & Production

A machine learning method where models are incrementally updated as new data arrives, rather than being trained in batch.

Cross-Validation

Training Techniques

A resampling technique that partitions data into subsets, training on some and validating on others to assess model generalisation.

Self-Supervised Learning

Advanced Methods

A learning paradigm where models generate their own supervisory signals from unlabelled data through pretext tasks.