Overview
Direct Answer
Model pruning is a compression technique that removes weights, neurons, or entire layers from a trained neural network based on their contribution to model performance. This reduces model size and inference latency whilst typically preserving accuracy within acceptable thresholds.
How It Works
Pruning algorithms identify and eliminate parameters below importance thresholds calculated through magnitude-based scoring, gradient analysis, or sensitivity measurement. Weights close to zero are removed first, followed by fine-tuning to recover any accuracy loss. Structured pruning removes entire filters or channels; unstructured pruning removes individual weights.
Why It Matters
Reduced model size enables deployment on edge devices, mobile platforms, and resource-constrained environments where memory and power consumption are critical constraints. Faster inference directly decreases latency and operational costs in cloud-hosted inference services. This accessibility expands deep learning adoption across embedded systems and real-time applications.
Common Applications
Computer vision models for mobile deployment, natural language processing systems for edge inference, recommendation systems optimised for low-latency serving, and autonomous vehicle perception modules operating under strict computational budgets benefit from this technique.
Key Considerations
Aggressive pruning can degrade model accuracy or introduce instability; practitioners must balance compression gains against performance requirements. Unstructured pruning may offer better accuracy preservation but requires specialised hardware acceleration; structured approaches sacrifice less accuracy but provide broader hardware compatibility.
Cross-References(1)
More in Artificial Intelligence
Ontology
Foundations & TheoryA formal representation of knowledge as a set of concepts, categories, and relationships within a specific domain.
Retrieval-Augmented Generation
Infrastructure & OperationsA technique combining information retrieval with text generation, allowing AI to access external knowledge before generating responses.
Direct Preference Optimisation
Training & InferenceA simplified alternative to RLHF that directly optimises language model policies using preference data without requiring a separate reward model.
Heuristic Search
Reasoning & PlanningProblem-solving techniques that use practical rules of thumb to find satisfactory solutions when exhaustive search is impractical.
AI Model Card
Safety & GovernanceA documentation framework that provides standardised information about an AI model's intended use, performance characteristics, limitations, and ethical considerations.
Expert System
Infrastructure & OperationsAn AI program that emulates the decision-making ability of a human expert by using a knowledge base and inference rules.
Artificial Superintelligence
Foundations & TheoryA theoretical level of AI that surpasses human cognitive abilities across all domains, including creativity and social intelligence.
Precision
Evaluation & MetricsThe ratio of true positive predictions to all positive predictions, measuring accuracy of positive classifications.