Model Quantisation — Technology Wiki

Overview

The process of reducing the numerical precision of a model's weights and activations from floating-point to lower-bit representations, decreasing memory usage and inference latency.

Cross-References(1)

Artificial Intelligence

Precision

Related in Models & Architecture

Tensor Processing Unit

Google's custom-designed application-specific integrated circuit for accelerating machine learning workloads.

Neural Processing Unit

A specialised processor designed to accelerate neural network computations in edge devices and mobile platforms.

Model Distillation

A technique where a smaller, simpler model is trained to replicate the behaviour of a larger, more complex model.

Model Pruning

The process of removing redundant or less important parameters from a neural network to reduce its size and computational cost.

Neural Architecture Search

An automated technique for designing optimal neural network architectures using search algorithms.

Sparse Attention

An attention mechanism that selectively computes relationships between a subset of input tokens rather than all pairs, reducing quadratic complexity in transformer models.

Model Collapse

A degradation phenomenon where AI models trained on AI-generated data progressively lose diversity and accuracy, converging toward a narrow distribution of outputs.

Neural Scaling Laws

Empirical relationships describing how AI model performance improves predictably with increases in model size, training data volume, and computational resources.

Speculative Decoding

An inference acceleration technique where a small draft model generates candidate token sequences that are verified in parallel by the larger target model.

More in Artificial Intelligence

Backward Chaining

Reasoning & Planning

An inference strategy that starts with a goal and works backward through rules to determine what facts must be true.

AI Orchestration

Infrastructure & Operations

The coordination and management of multiple AI models, services, and workflows to achieve complex end-to-end automation.

AI Agent Orchestration

Infrastructure & Operations

The coordination and management of multiple AI agents working together to accomplish complex tasks, routing subtasks between specialised agents based on capability and context.

Reinforcement Learning from Human Feedback

Training & Inference

A training paradigm where AI models are refined using human preference signals, aligning model outputs with human values and quality expectations through reward modelling.

Frame Problem

Foundations & Theory

The challenge in AI of representing the effects of actions without having to explicitly state everything that remains unchanged.

Cognitive Computing

Foundations & Theory

Computing systems that simulate human thought processes using self-learning algorithms, data mining, pattern recognition, and natural language processing.

AI Tokenomics

Infrastructure & Operations

The economic model governing the pricing and allocation of computational resources for AI inference, including per-token billing, rate limiting, and credit systems.

Strong AI

Foundations & Theory

A theoretical form of AI that would have consciousness, self-awareness, and the ability to truly understand rather than simulate understanding.