On-call
ELI5 — The Vibe Check
On-call means it's your turn to be the person who gets woken up at 3am if production breaks. Teams rotate on-call duty — one week you're it, next week someone else is. Being on-call is the reality check that makes engineers care about writing reliable software. Your pager doesn't care that it's Sunday.
Real Talk
On-call is a rotation where team members are designated as the first responder for production incidents outside of business hours. On-call engineers carry a pager (or PagerDuty/OpsGenie alerts) and are expected to acknowledge and begin addressing incidents within a defined SLA (often 5-15 minutes).
When You'll Hear This
"I'm on-call this week — please don't deploy anything major." / "Set up an on-call rotation in PagerDuty with fair scheduling."
Related Terms
Alerting
Alerting is the part of monitoring that actually wakes people up when something goes wrong.
Incident
An incident is when something has gone wrong in production and users are affected.
Incident Response
Incident Response is the process your team follows when production breaks. Who gets paged? Who's the incident commander?
Pager
A pager (or more likely PagerDuty/OpsGenie today) is the alert that goes off on the on-call engineer's phone when something breaks in production.
Runbook
A Runbook is a step-by-step guide for handling a specific operational task or incident.
SRE (Site Reliability Engineering)
SRE is Google's version of DevOps with a more engineering-focused twist.