Overview
The maximum amount of time a service can be unavailable within a given period based on its SLO.
More in DevOps & Infrastructure
Mean Time to Recovery
CI/CDThe average time it takes to restore a system to normal operation after a failure or incident.
Runbook
Site ReliabilityA documented set of procedures for handling routine operations and troubleshooting common issues.
Service Level Indicator
CI/CDA quantitative measure of some aspect of the level of service being provided.
Configuration Management
Infrastructure as CodeThe practice of systematically managing and maintaining the consistency of system configurations.
Puppet
Infrastructure as CodeA configuration management tool that automates the provisioning and management of infrastructure.
Blameless Culture
CI/CDAn organisational approach where incident reviews focus on systemic improvements rather than individual blame.
Mean Time Between Failures
CI/CDThe average time between system failures, measuring reliability and availability.
Playbook
CI/CDA comprehensive guide containing strategies, procedures, and best practices for managing specific operational scenarios.