Incident
ELI5 — The Vibe Check
An incident is when something has gone wrong in production and users are affected. It's not just a bug in your code — it's an active emergency with real users unable to use your product right now. Incidents have severity levels (P1 = everything's on fire, P4 = slight weirdness), and teams have processes for handling them fast.
Real Talk
An incident is an unplanned interruption or degradation of a service that affects users. Incidents are classified by severity (P1/SEV1 through P4/SEV4), trigger an on-call response process, are tracked in incident management tools (PagerDuty, OpsGenie), and result in a postmortem once resolved.
When You'll Hear This
"Declare an incident — checkout is down for all EU users." / "How many P1 incidents did we have last quarter?"
Related Terms
Alerting
Alerting is the part of monitoring that actually wakes people up when something goes wrong.
Incident Response
Incident Response is the process your team follows when production breaks. Who gets paged? Who's the incident commander?
Monitoring
Monitoring is keeping a constant eye on your app while it runs — tracking whether it's up, how fast it responds, how many errors it throws, and how much me...
On-call
On-call means it's your turn to be the person who gets woken up at 3am if production breaks.
Pager
A pager (or more likely PagerDuty/OpsGenie today) is the alert that goes off on the on-call engineer's phone when something breaks in production.
Postmortem
A Postmortem is the meeting you have after an incident to figure out what went wrong and how to prevent it from happening again.