Overview
Direct Answer
Model distillation is a compression technique in which a smaller, more efficient model—called the student—learns to approximate the predictions and internal representations of a larger, pre-trained model—called the teacher. The student model achieves comparable performance whilst requiring substantially fewer parameters and computational resources.
How It Works
The teacher model generates soft probability distributions (logits) over training data, which encode richer decision boundaries than hard labels alone. The student is trained on a combined loss function that minimises divergence from the teacher's soft outputs whilst maintaining accuracy on the original task. Temperature scaling adjusts the softness of these distributions, controlling the level of knowledge transfer and enabling the student to learn generalised patterns rather than memorising specific examples.
Why It Matters
Distillation enables deployment of high-performance models on resource-constrained devices, reducing latency, energy consumption, and infrastructure costs in production environments. This is critical for real-time applications such as mobile inference, edge computing, and large-scale serving where computational efficiency directly impacts operational expenses and user experience.
Common Applications
Applications include compressing language models for on-device natural language processing, accelerating computer vision models in autonomous systems, and optimising recommendation engines in e-commerce platforms. Financial institutions use distillation to deploy fraud detection models with lower latency, whilst healthcare organisations compress diagnostic models for integration into clinical decision-support systems.
Key Considerations
Knowledge transfer is not guaranteed; student models may fail to capture all nuances of teacher behaviour, particularly on out-of-distribution data. Determining optimal student architecture, temperature hyperparameters, and loss weighting between task accuracy and teacher mimicry requires substantial experimentation and validation.
More in Artificial Intelligence
Planning Algorithm
Reasoning & PlanningAn AI algorithm that generates a sequence of actions to achieve a specified goal from an initial state.
Artificial Intelligence
Foundations & TheoryThe simulation of human intelligence processes by computer systems, including learning, reasoning, and self-correction.
Artificial General Intelligence
Foundations & TheoryA hypothetical form of AI that possesses the ability to understand, learn, and apply knowledge across any intellectual task a human can perform.
Prompt Engineering
Prompting & InteractionThe practice of designing and optimising input prompts to elicit desired outputs from large language models.
AI Red Teaming
Safety & GovernanceThe systematic adversarial testing of AI systems to identify vulnerabilities, failure modes, harmful outputs, and safety risks before deployment.
AI Ethics
Foundations & TheoryThe branch of ethics examining moral issues surrounding the development, deployment, and impact of artificial intelligence on society.
Knowledge Graph
Infrastructure & OperationsA structured representation of real-world entities and the relationships between them, used by AI for reasoning and inference.
AI Interpretability
Safety & GovernanceThe degree to which humans can understand the internal mechanics and reasoning of an AI model's predictions and decisions.