Reinforcement Learning from Human Feedback — Technology Wiki

Overview

A training paradigm where AI models are refined using human preference signals, aligning model outputs with human values and quality expectations through reward modelling.

Related in Training & Inference

AI Bias

Systematic errors in AI outputs that arise from biased training data, flawed assumptions, or prejudicial algorithm design.

Causal Inference

The process of determining cause-and-effect relationships from data, going beyond correlation to establish causation.

AI Feature Store

A centralised platform for storing, managing, and serving machine learning features consistently across training and inference.

Federated Learning

A machine learning approach where models are trained across decentralised devices without sharing raw data, preserving privacy.

AI Inference

The process of using a trained AI model to make predictions or decisions on new, unseen data.

AI Training

The process of teaching an AI model to recognise patterns by exposing it to large datasets and adjusting its parameters.

Hyperparameter Tuning

The process of optimising the external configuration settings of a machine learning model that are not learned during training.

AutoML

Automated machine learning that automates the end-to-end process of applying machine learning to real-world problems.

Direct Preference Optimisation

A simplified alternative to RLHF that directly optimises language model policies using preference data without requiring a separate reward model.

Model Merging

Techniques for combining the weights and capabilities of multiple fine-tuned models into a single model without additional training, creating versatile multi-capability systems.

More in Artificial Intelligence

BLEU Score

Evaluation & Metrics

A metric for evaluating the quality of machine-generated text by comparing it to reference translations or texts.

ROC Curve

Evaluation & Metrics

A graphical plot illustrating the diagnostic ability of a binary classifier as its discrimination threshold is varied.

System Prompt

Prompting & Interaction

An initial instruction set provided to a language model that defines its persona, constraints, output format, and behavioural guidelines for a given session or application.

AI Guardrails

Safety & Governance

Safety mechanisms and constraints implemented around AI systems to prevent harmful, biased, or policy-violating outputs while preserving useful functionality.

Model Pruning

Models & Architecture

The process of removing redundant or less important parameters from a neural network to reduce its size and computational cost.

Retrieval-Augmented Generation

Infrastructure & Operations

A technique combining information retrieval with text generation, allowing AI to access external knowledge before generating responses.

Few-Shot Prompting

Prompting & Interaction

A technique where a language model is given a small number of examples within the prompt to guide its response pattern.

Chain-of-Thought Prompting

Prompting & Interaction

A prompting technique that encourages language models to break down reasoning into intermediate steps before providing an answer.