Service Level Indicator — Technology Wiki

Overview

Direct Answer

A Service Level Indicator (SLI) is a measurable attribute of service performance, such as latency, error rate, or availability, that quantifies one specific dimension of user experience or system behaviour. SLIs form the foundation for defining and tracking whether services meet their intended reliability targets.

How It Works

SLIs are derived from raw metrics collected by observability tools—request response times, failed transactions, uptime periods—and aggregated over defined windows to produce a single percentage or rate. These measurements are then compared against thresholds to determine compliance with service level objectives (SLOs), enabling teams to detect degradation before it impacts end users significantly.

Why It Matters

Organisations rely on SLIs to establish accountability for reliability, justify infrastructure investment, and balance feature velocity against stability concerns. For customer-facing services, poor SLI performance directly correlates with customer churn and revenue loss; for internal systems, it affects productivity and operational costs.

Common Applications

SLIs are used in cloud platforms to monitor uptime and latency, in payment systems to track transaction success rates, and in content delivery networks to measure page load times. DevOps teams use SLIs to trigger automated scaling decisions, whilst incident response teams employ them to prioritise and escalate outages.

Key Considerations

Selecting meaningful SLIs requires understanding actual user impact rather than operational convenience—measuring CPU utilisation, for example, does not necessarily reflect user satisfaction. SLI targets must be ambitious enough to drive quality improvements yet achievable enough to maintain team morale and prevent alert fatigue.

Referenced By1 term mentions Service Level Indicator

Other entries in the wiki whose definition references Service Level Indicator — useful for understanding how this concept connects across DevOps & Infrastructure and adjacent domains.

Service Level Objective·DevOps & Infrastructure

Related in CI/CD

DevOps

A set of practices combining software development and IT operations to shorten the development lifecycle and deliver continuous value.

CI/CD Pipeline

An automated workflow that builds, tests, and deploys software changes from development to production.

Build Automation

The process of automating the compilation, testing, and packaging of software applications.

Artifact Repository

A centralised storage system for managing binary artifacts produced during the software build process.

ChatOps

A collaboration model connecting tools, processes, and automation with team chat platforms for operations management.

Post-Mortem Analysis

A structured review conducted after an incident to identify root causes and prevent recurrence.

Blameless Culture

An organisational approach where incident reviews focus on systemic improvements rather than individual blame.

Mean Time to Recovery

The average time it takes to restore a system to normal operation after a failure or incident.

Mean Time Between Failures

The average time between system failures, measuring reliability and availability.

Service Level Objective

A target value for a service level indicator that defines acceptable service performance.

Playbook

A comprehensive guide containing strategies, procedures, and best practices for managing specific operational scenarios.

Rolling Update

A deployment strategy that gradually replaces instances of the previous version with the new version.

More in DevOps & Infrastructure

Runbook

Site Reliability

A documented set of procedures for handling routine operations and troubleshooting common issues.

Capacity Planning

Site Reliability

The process of determining the production capacity needed to meet changing demands for an organisation's products.

Monitoring

Observability

The continuous observation of system performance, availability, and health using automated tools and dashboards.

Logging

Observability

The practice of recording events, errors, and system activities for debugging, auditing, and analysis.

Graceful Degradation

CI/CD

A design approach where a system continues to operate with reduced functionality when components fail.

Vertical Scaling

CI/CD

Increasing the resources (CPU, RAM, storage) of an existing machine to handle more load.

Container Registry

Containers & Orchestration

A repository for storing, managing, and distributing container images.

High Availability

Site Reliability

A system design approach that ensures a certain degree of operational continuity during a given measurement period.