Overview
Quantitative measurements collected over time to track system performance, health, and business outcomes.
More in DevOps & Infrastructure
High Availability
Site ReliabilityA system design approach that ensures a certain degree of operational continuity during a given measurement period.
Ansible
Infrastructure as CodeAn open-source automation tool for configuration management, application deployment, and task automation.
Chaos Engineering
Site ReliabilityThe discipline of experimenting on distributed systems to build confidence in their ability to withstand turbulent conditions.
Runbook
Site ReliabilityA documented set of procedures for handling routine operations and troubleshooting common issues.
Mean Time Between Failures
CI/CDThe average time between system failures, measuring reliability and availability.
CI/CD Pipeline
CI/CDAn automated workflow that builds, tests, and deploys software changes from development to production.
ChatOps
CI/CDA collaboration model connecting tools, processes, and automation with team chat platforms for operations management.
Immutable Infrastructure
Infrastructure as CodeAn approach where infrastructure components are never modified after deployment but replaced entirely with updated versions.