Word Embedding — Technology Wiki

Overview

Dense vector representations of words where semantically similar words are mapped to nearby points in vector space.

Related in Language Models

LoRA

Low-Rank Adaptation — a parameter-efficient fine-tuning technique that adds trainable low-rank matrices to frozen pretrained weights.

Parameter-Efficient Fine-Tuning

Methods for adapting large pretrained models to new tasks by only updating a small fraction of their parameters.

Adapter Layers

Small trainable modules inserted between frozen transformer layers that enable task-specific adaptation without modifying the original model weights.

Prefix Tuning

A parameter-efficient method that prepends trainable continuous vectors to the input of each transformer layer, guiding model behaviour without altering base parameters.

Pre-Training

The initial phase of training a deep learning model on a large unlabelled corpus using self-supervised objectives, establishing general-purpose representations for downstream adaptation.

Fine-Tuning

The process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.

More in Deep Learning

Positional Encoding

Training & Optimisation

A technique that injects information about the position of tokens in a sequence into transformer architectures.

Rotary Positional Encoding

Training & Optimisation

A position encoding method that encodes absolute position with a rotation matrix and naturally incorporates relative position information into attention computations.

Convolutional Neural Network

Architectures

A deep learning architecture designed for processing structured grid data like images, using convolutional filters to detect features.

Representation Learning

Architectures

The automatic discovery of data representations needed for feature detection or classification from raw data.

Capsule Network

Architectures

A neural network architecture that groups neurons into capsules to better capture spatial hierarchies and part-whole relationships.

Diffusion Model

Generative Models

A generative model that learns to reverse a gradual noising process, generating high-quality samples from random noise.

Gradient Checkpointing

Architectures

A memory optimisation that trades computation for memory by recomputing intermediate activations during the backward pass instead of storing them all during the forward pass.

Autoencoder

Architectures

A neural network trained to encode input data into a compressed representation and then decode it back to reconstruct the original.