Overview
A design approach where a system continues to operate with reduced functionality when components fail.
More in DevOps & Infrastructure
Container Registry
Containers & OrchestrationA repository for storing, managing, and distributing container images.
Health Check
CI/CDAn automated test that verifies a service or system component is functioning correctly.
Ansible
Infrastructure as CodeAn open-source automation tool for configuration management, application deployment, and task automation.
Error Budget
ObservabilityThe maximum amount of time a service can be unavailable within a given period based on its SLO.
Rolling Update
CI/CDA deployment strategy that gradually replaces instances of the previous version with the new version.
Chaos Engineering
Site ReliabilityThe discipline of experimenting on distributed systems to build confidence in their ability to withstand turbulent conditions.
Metrics
ObservabilityQuantitative measurements collected over time to track system performance, health, and business outcomes.
Alerting
ObservabilityAutomated notifications triggered when system metrics or conditions exceed predefined thresholds.