Overview
The process of reverting a system to a previous version or state after a failed deployment or update.
More in DevOps & Infrastructure
Chaos Engineering
Site ReliabilityThe discipline of experimenting on distributed systems to build confidence in their ability to withstand turbulent conditions.
Logging
ObservabilityThe practice of recording events, errors, and system activities for debugging, auditing, and analysis.
Prometheus
ObservabilityAn open-source monitoring and alerting toolkit designed for reliability and scalability in cloud-native environments.
Observability
ObservabilityThe ability to understand a system's internal state from its external outputs, encompassing metrics, logs, and traces.
GitOps
Infrastructure as CodeAn operational framework using Git repositories as the single source of truth for declarative infrastructure and applications.
Grafana
ObservabilityAn open-source analytics and visualisation platform for monitoring metrics from multiple data sources.
Distributed Tracing
ObservabilityA method of tracking requests as they flow through distributed systems to diagnose latency and failure points.
Error Budget
ObservabilityThe maximum amount of time a service can be unavailable within a given period based on its SLO.