Overview
Direct Answer
Data drift refers to the degradation of machine learning model performance caused by shifts in the statistical distribution of input features or target variables after deployment. This phenomenon occurs when the real-world data generating process diverges from the training data, violating the assumption that training and production distributions remain constant.
How It Works
Models learn patterns from historical training data and optimise weights based on those distributions. When production data exhibits different feature correlations, class proportions, or value ranges, the model's learned decision boundaries become misaligned with actual patterns. This misalignment accumulates as predictions become increasingly inaccurate without explicit retraining or monitoring mechanisms to detect distributional changes.
Why It Matters
Model degradation directly impacts business outcomes through reduced prediction accuracy, flawed decision-making, and compliance violations in regulated industries. Organisations that fail to detect and remediate drift experience financial losses, customer dissatisfaction, and reputational damage. Continuous monitoring and retraining are essential to maintain model reliability and ROI.
Common Applications
Fraud detection systems experience drift as fraudster behaviour evolves; credit risk models drift when economic conditions shift; recommendation engines drift as user preferences change seasonally; medical diagnostic models drift as patient demographics or equipment calibration varies.
Key Considerations
Distinguishing data drift from concept drift (target distribution changes) requires different remediation strategies. Drift detection introduces operational overhead and latency considerations that must be balanced against the cost of model degradation.
Cross-References(1)
Referenced By1 term mentions Data Drift
Other entries in the wiki whose definition references Data Drift — useful for understanding how this concept connects across Data Science & Analytics and adjacent domains.
More in Data Science & Analytics
Exploratory Data Analysis
Statistics & MethodsAn approach to analysing datasets to summarise their main characteristics, often using statistical graphics and visualisation.
Data Observability
Data EngineeringThe ability to understand, diagnose, and resolve data quality issues across the data stack by monitoring freshness, distribution, volume, schema, and lineage of data assets.
Data Annotation
Statistics & MethodsThe process of labelling data with informative tags to make it usable for training supervised machine learning models.
ETL Pipeline
Data EngineeringAn automated workflow that extracts data from sources, transforms it according to business rules, and loads it into a target system.
Statistical Modelling
Statistics & MethodsThe process of applying statistical analysis to a dataset, identifying relationships and patterns within the data.
Data Mart
Data EngineeringA subset of a data warehouse focused on a particular business area, department, or subject.
Data Storytelling
VisualisationThe practice of building narratives around data insights using visualisations and narrative techniques.
Semantic Layer
Statistics & MethodsAn abstraction layer that provides business-friendly definitions and consistent metrics on top of raw data, enabling self-service analytics with standardised terminology.