Extractive Summarisation — Technology Wiki

Overview

Direct Answer

Extractive summarisation is a Natural Language Processing technique that automatically condenses documents by selecting and retaining the most salient sentences from the original source, preserving their exact wording without paraphrase or generation of new content.

How It Works

The approach ranks sentences using statistical or machine learning methods—such as term frequency-inverse document frequency (TF-IDF), graph-based algorithms, or neural scoring models—to identify those carrying the greatest semantic importance. Selected sentences are then assembled in their original sequence to form a shorter document, maintaining coherence through preservation of the source text's structure and language.

Why It Matters

Organisations benefit from rapid document processing at scale, particularly where speed and interpretability are critical; since no novel text is generated, output remains fully traceable to source material, supporting compliance, auditability, and stakeholder trust. This approach reduces computational overhead compared to abstractive methods, making it cost-effective for high-volume document workflows.

Common Applications

Applications include legal document review, where key clauses and obligations must be flagged; news aggregation platforms requiring fast headline extraction; customer support ticket prioritisation; and scientific literature filtering in research institutions seeking rapid assessment of publication relevance.

Key Considerations

The technique cannot bridge gaps in source content or reshape information for clarity, limiting its effectiveness where documents are poorly structured or where context requires paraphrasing. Quality depends heavily on sentence-ranking algorithm selection and may miss nuanced information valuable to specific user contexts.

Related in Generation & Translation

Machine Translation

The use of AI to automatically translate text or speech from one natural language to another.

Question Answering

An NLP task where a system automatically answers questions posed in natural language based on given context.

Text Generation

The process of producing coherent and contextually relevant text using AI language models.

Chatbot

A software application that simulates human conversation through text or voice interactions using NLP.

Conversational AI

AI systems designed to engage in natural, context-aware dialogue with humans across multiple turns.

Dialogue System

A computer system designed to converse with humans, encompassing task-oriented and open-domain conversation.

Top-K Sampling

A text generation strategy that restricts the model to sampling from the K most probable next tokens.

Text-to-SQL

The task of automatically converting natural language questions into executable SQL queries, enabling non-technical users to interrogate databases through conversational interfaces.

Intent Detection

The classification of user utterances into predefined categories representing the user's goal or purpose, a fundamental component of conversational AI and chatbot systems.

Dialogue Management

The component of conversational systems that tracks conversation state, determines the next system action, and maintains coherent multi-turn interactions with users.

More in Natural Language Processing

GloVe

Semantics & Representation

Global Vectors for Word Representation — an unsupervised learning algorithm for obtaining word vector representations from aggregated word co-occurrence statistics.

Seq2Seq Model

Core NLP

A neural network architecture that maps an input sequence to an output sequence, used in translation and summarisation.

Chunking Strategy

Core NLP

The method of dividing long documents into smaller segments for embedding and retrieval, balancing context preservation with optimal chunk sizes for vector search accuracy.

Text Embedding

Core NLP

Dense vector representations of text passages that capture semantic meaning for similarity comparison and retrieval.

Coreference Resolution

Parsing & Structure

The task of identifying all expressions in text that refer to the same real-world entity.

Natural Language Processing

Core NLP

The field of AI focused on enabling computers to understand, interpret, and generate human language.

Large Language Model

Semantics & Representation

A neural network trained on massive text corpora that can generate, understand, and reason about natural language.

Token Limit

Semantics & Representation

The maximum number of tokens a language model can process in a single input-output interaction.