Vector Database — Technology Wiki

Overview

A database optimised for storing and querying high-dimensional vector embeddings for similarity search.

Related in Core NLP

Natural Language Processing

The field of AI focused on enabling computers to understand, interpret, and generate human language.

Seq2Seq Model

A neural network architecture that maps an input sequence to an output sequence, used in translation and summarisation.

Latent Dirichlet Allocation

A generative probabilistic model for discovering topics in a collection of documents.

Text Embedding

Dense vector representations of text passages that capture semantic meaning for similarity comparison and retrieval.

Semantic Search

Search technology that understands the meaning and intent behind queries rather than just matching keywords.

Constitutional AI

An approach to AI alignment where models are trained to follow a set of principles or constitution.

Natural Language Understanding

The subfield of NLP focused on machine reading comprehension and extracting meaning from text.

Natural Language Generation

The subfield of NLP concerned with producing natural language text from structured data or representations.

Document Understanding

AI systems that extract structured information from unstructured documents by combining optical character recognition, layout analysis, and natural language comprehension.

Slot Filling

The task of extracting specific parameter values from user utterances to fulfil a detected intent, such as identifying dates, locations, and names in booking requests.

Cross-Lingual Transfer

The application of models trained in one language to perform tasks in another language, leveraging shared multilingual representations learned during pre-training.

Text Embedding Model

A neural network trained to convert text passages into fixed-dimensional vectors that capture semantic meaning, enabling similarity search, clustering, and retrieval applications.

More in Natural Language Processing

Tokenisation

Semantics & Representation

The process of breaking text into smaller units (tokens) such as words, subwords, or characters for processing by language models.

BERT

Semantics & Representation

Bidirectional Encoder Representations from Transformers — a language model that understands context by reading text in both directions.

Named Entity Recognition

Parsing & Structure

An NLP task that identifies and classifies named entities in text into categories like person, organisation, and location.

Instruction Tuning

Semantics & Representation

Training a language model to follow natural language instructions by fine-tuning on instruction-response pairs.

Abstractive Summarisation

Text Analysis

A text summarisation approach that generates novel sentences to capture the essential meaning of a document, rather than simply extracting and rearranging existing sentences.

Context Window

Semantics & Representation

The maximum amount of text a language model can consider at once when generating a response.

GloVe

Semantics & Representation

Global Vectors for Word Representation — an unsupervised learning algorithm for obtaining word vector representations from aggregated word co-occurrence statistics.

Chunking Strategy

Core NLP

The method of dividing long documents into smaller segments for embedding and retrieval, balancing context preservation with optimal chunk sizes for vector search accuracy.