Overview
Direct Answer
Extractive summarisation is a Natural Language Processing technique that automatically condenses documents by selecting and retaining the most salient sentences from the original source, preserving their exact wording without paraphrase or generation of new content.
How It Works
The approach ranks sentences using statistical or machine learning methods—such as term frequency-inverse document frequency (TF-IDF), graph-based algorithms, or neural scoring models—to identify those carrying the greatest semantic importance. Selected sentences are then assembled in their original sequence to form a shorter document, maintaining coherence through preservation of the source text's structure and language.
Why It Matters
Organisations benefit from rapid document processing at scale, particularly where speed and interpretability are critical; since no novel text is generated, output remains fully traceable to source material, supporting compliance, auditability, and stakeholder trust. This approach reduces computational overhead compared to abstractive methods, making it cost-effective for high-volume document workflows.
Common Applications
Applications include legal document review, where key clauses and obligations must be flagged; news aggregation platforms requiring fast headline extraction; customer support ticket prioritisation; and scientific literature filtering in research institutions seeking rapid assessment of publication relevance.
Key Considerations
The technique cannot bridge gaps in source content or reshape information for clarity, limiting its effectiveness where documents are poorly structured or where context requires paraphrasing. Quality depends heavily on sentence-ranking algorithm selection and may miss nuanced information valuable to specific user contexts.
More in Natural Language Processing
GloVe
Semantics & RepresentationGlobal Vectors for Word Representation — an unsupervised learning algorithm for obtaining word vector representations from aggregated word co-occurrence statistics.
Seq2Seq Model
Core NLPA neural network architecture that maps an input sequence to an output sequence, used in translation and summarisation.
Chunking Strategy
Core NLPThe method of dividing long documents into smaller segments for embedding and retrieval, balancing context preservation with optimal chunk sizes for vector search accuracy.
Text Embedding
Core NLPDense vector representations of text passages that capture semantic meaning for similarity comparison and retrieval.
Coreference Resolution
Parsing & StructureThe task of identifying all expressions in text that refer to the same real-world entity.
Natural Language Processing
Core NLPThe field of AI focused on enabling computers to understand, interpret, and generate human language.
Large Language Model
Semantics & RepresentationA neural network trained on massive text corpora that can generate, understand, and reason about natural language.
Token Limit
Semantics & RepresentationThe maximum number of tokens a language model can process in a single input-output interaction.