Overview
Direct Answer
BLEU (Bilingual Evaluation Understudy) is a quantitative metric that measures the correspondence between machine-generated text and one or more reference translations by comparing n-gram overlap. It produces a score between 0 and 1, where higher scores indicate closer alignment with reference text.
How It Works
The metric calculates the proportion of n-grams (sequences of 1 to 4 words) in the generated output that appear in the reference text(s), applying a brevity penalty to prevent artificially inflated scores from shorter translations. Precision is computed for each n-gram length, then combined using geometric averaging to produce a single composite score.
Why It Matters
BLEU enables rapid, reproducible evaluation of machine translation and text generation systems without requiring manual human assessment, significantly reducing evaluation costs and enabling continuous quality monitoring across translation pipelines and model iterations.
Common Applications
The metric is widely deployed in machine translation evaluation, multilingual natural language processing research, and quality assurance workflows for automated subtitle generation and cross-language content adaptation systems.
Key Considerations
BLEU scores correlate imperfectly with human judgement of translation quality and cannot detect semantic correctness or fluency; a single reference translation may penalise valid alternative phrasings, necessitating supplementary evaluation methods for comprehensive quality assessment.
More in Artificial Intelligence
Artificial Superintelligence
Foundations & TheoryA theoretical level of AI that surpasses human cognitive abilities across all domains, including creativity and social intelligence.
AI Accelerator
Infrastructure & OperationsSpecialised hardware designed to speed up AI computations, including GPUs, TPUs, and custom AI chips.
In-Context Learning
Prompting & InteractionThe ability of large language models to learn new tasks from examples provided within the input prompt without parameter updates.
Connectionism
Foundations & TheoryAn approach to AI modelling cognitive processes using artificial neural networks inspired by biological neural structures.
AI Orchestration
Infrastructure & OperationsThe coordination and management of multiple AI models, services, and workflows to achieve complex end-to-end automation.
AI Pipeline
Infrastructure & OperationsA sequence of data processing and model execution steps that automate the flow from raw data to AI-driven outputs.
Edge AI
Foundations & TheoryArtificial intelligence algorithms processed locally on edge devices rather than in centralised cloud data centres.
Neural Architecture Search
Models & ArchitectureAn automated technique for designing optimal neural network architectures using search algorithms.