Overview
An organisational approach where incident reviews focus on systemic improvements rather than individual blame.
More in DevOps & Infrastructure
Blue-Green Infrastructure
CI/CDMaintaining two identical production environments to enable instant switching between versions.
Distributed Tracing
ObservabilityA method of tracking requests as they flow through distributed systems to diagnose latency and failure points.
Elasticity
CI/CDThe ability of a system to automatically scale resources up or down based on current demand.
Monitoring
ObservabilityThe continuous observation of system performance, availability, and health using automated tools and dashboards.
Helm
Containers & OrchestrationA package manager for Kubernetes that simplifies the deployment and management of applications using charts.
Incident Management
Site ReliabilityThe processes and tools for detecting, responding to, resolving, and learning from service disruptions.
High Availability
Site ReliabilityA system design approach that ensures a certain degree of operational continuity during a given measurement period.
Health Check
CI/CDAn automated test that verifies a service or system component is functioning correctly.