Error Budget
ELI5 — The Vibe Check
An Error Budget is the amount of unreliability you're allowed before you stop shipping features and focus on stability. If your SLO is 99.9% uptime, your error budget is 0.1% downtime (~43 min/month). Use it up? Feature freeze until you're stable again. It turns reliability into a measurable currency.
Real Talk
Error budget is the difference between perfect reliability (100%) and your SLO target. It quantifies acceptable unreliability and creates a data-driven balance between feature velocity and reliability investment. When the budget is exhausted, teams prioritize reliability work over feature development.
When You'll Hear This
"We burned through 80% of our error budget with that outage — time to focus on reliability." / "The error budget policy says: if monthly budget is <25%, no risky deploys."
Related Terms
Observability
Observability is the ability to understand what's happening inside your system from the outside, using three types of data: metrics (numbers), logs (events...
SLA (Service Level Agreement)
An SLA is a contract between you and your users about how reliable your service will be. 'We promise the app will be up 99.9% of the time.
SLI (Service Level Indicator)
An SLI is the actual measurement you track to know if you're hitting your SLO. If the SLO says 'be fast,' the SLI is the actual timer measuring speed.
SLO (Service Level Objective)
An SLO is the internal target you set for how well your service should run. It's like a personal fitness goal vs a race you signed up for.