AI Red Teaming — Technology Wiki

Overview

The systematic adversarial testing of AI systems to identify vulnerabilities, failure modes, harmful outputs, and safety risks before deployment.

Related in Safety & Governance

AI Alignment

The research field focused on ensuring AI systems act in accordance with human values, intentions, and ethical principles.

AI Safety

The interdisciplinary field dedicated to making AI systems safe, robust, and beneficial while minimizing risks of unintended consequences.

AI Governance

The frameworks, policies, and regulations that guide the responsible development and deployment of AI technologies.

AI Explainability

The ability to describe AI decision-making processes in human-understandable terms, enabling trust and regulatory compliance.

AI Interpretability

The degree to which humans can understand the internal mechanics and reasoning of an AI model's predictions and decisions.

AI Fairness

The principle of ensuring AI systems make equitable decisions without discriminating against any group based on protected attributes.

AI Transparency

The practice of making AI systems' operations, data usage, and decision processes openly visible to stakeholders.

AI Robustness

The ability of an AI system to maintain performance under varying conditions, adversarial attacks, or noisy input data.

AI Hallucination

When an AI model generates plausible-sounding but factually incorrect or fabricated information with high confidence.

AI Watermarking

Techniques for embedding imperceptible statistical patterns in AI-generated content to enable reliable detection and provenance tracking of synthetic outputs.

AI Guardrails

Safety mechanisms and constraints implemented around AI systems to prevent harmful, biased, or policy-violating outputs while preserving useful functionality.

AI Model Card

A documentation framework that provides standardised information about an AI model's intended use, performance characteristics, limitations, and ethical considerations.

More in Artificial Intelligence

Cognitive Computing

Foundations & Theory

Computing systems that simulate human thought processes using self-learning algorithms, data mining, pattern recognition, and natural language processing.

AI Tokenomics

Infrastructure & Operations

The economic model governing the pricing and allocation of computational resources for AI inference, including per-token billing, rate limiting, and credit systems.

Reinforcement Learning from Human Feedback

Training & Inference

A training paradigm where AI models are refined using human preference signals, aligning model outputs with human values and quality expectations through reward modelling.

Turing Test

Foundations & Theory

A measure of machine intelligence proposed by Alan Turing, where a machine is deemed intelligent if it can exhibit conversation indistinguishable from a human.

Fuzzy Logic

Reasoning & Planning

A form of logic that handles approximate reasoning, allowing variables to have degrees of truth rather than strict binary true/false values.

Heuristic Search

Reasoning & Planning

Problem-solving techniques that use practical rules of thumb to find satisfactory solutions when exhaustive search is impractical.

Tensor Processing Unit

Models & Architecture

Google's custom-designed application-specific integrated circuit for accelerating machine learning workloads.

Frame Problem

Foundations & Theory

The challenge in AI of representing the effects of actions without having to explicitly state everything that remains unchanged.