Data Drift

Overview

Direct Answer

Data drift refers to the degradation of machine learning model performance caused by shifts in the statistical distribution of input features or target variables after deployment. This phenomenon occurs when the real-world data generating process diverges from the training data, violating the assumption that training and production distributions remain constant.

How It Works

Models learn patterns from historical training data and optimise weights based on those distributions. When production data exhibits different feature correlations, class proportions, or value ranges, the model's learned decision boundaries become misaligned with actual patterns. This misalignment accumulates as predictions become increasingly inaccurate without explicit retraining or monitoring mechanisms to detect distributional changes.

Why It Matters

Model degradation directly impacts business outcomes through reduced prediction accuracy, flawed decision-making, and compliance violations in regulated industries. Organisations that fail to detect and remediate drift experience financial losses, customer dissatisfaction, and reputational damage. Continuous monitoring and retraining are essential to maintain model reliability and ROI.

Common Applications

Fraud detection systems experience drift as fraudster behaviour evolves; credit risk models drift when economic conditions shift; recommendation engines drift as user preferences change seasonally; medical diagnostic models drift as patient demographics or equipment calibration varies.

Key Considerations

Distinguishing data drift from concept drift (target distribution changes) requires different remediation strategies. Drift detection introduces operational overhead and latency considerations that must be balanced against the cost of model degradation.

Cross-References(1)

Machine Learning

Referenced By1 term mentions Data Drift

Other entries in the wiki whose definition references Data Drift — useful for understanding how this concept connects across Data Science & Analytics and adjacent domains.

Model Monitoring·Machine Learning

Related in Data Governance

Data Governance

The framework of policies, processes, and standards for managing data assets to ensure quality, security, and compliance.

Data Catalogue

A metadata management tool that helps organisations find, understand, and manage their data assets.

More in Data Science & Analytics

Exploratory Data Analysis

Statistics & Methods

An approach to analysing datasets to summarise their main characteristics, often using statistical graphics and visualisation.

Data Observability

Data Engineering

The ability to understand, diagnose, and resolve data quality issues across the data stack by monitoring freshness, distribution, volume, schema, and lineage of data assets.

Data Annotation

Statistics & Methods

The process of labelling data with informative tags to make it usable for training supervised machine learning models.

ETL Pipeline

Data Engineering

An automated workflow that extracts data from sources, transforms it according to business rules, and loads it into a target system.

Statistical Modelling

Statistics & Methods

The process of applying statistical analysis to a dataset, identifying relationships and patterns within the data.

Data Mart

Data Engineering

A subset of a data warehouse focused on a particular business area, department, or subject.

Data Storytelling

Visualisation

The practice of building narratives around data insights using visualisations and narrative techniques.

Semantic Layer

Statistics & Methods

An abstraction layer that provides business-friendly definitions and consistent metrics on top of raw data, enabling self-service analytics with standardised terminology.

Overview

Direct Answer

How It Works

Why It Matters

Common Applications

Key Considerations

Cross-References(1)

Referenced By1 term mentions Data Drift

Related in Data Governance

Data Governance

Data Catalogue

More in Data Science & Analytics

Exploratory Data Analysis

Data Observability

Data Annotation

ETL Pipeline

Statistical Modelling

Data Mart

Data Storytelling

Semantic Layer

See Also

Machine Learning