[{"data":1,"prerenderedAt":85},["ShallowReactive",2],{"term-s\u002Fsite-reliability-engineering":3,"related-s\u002Fsite-reliability-engineering":62},{"id":4,"title":5,"acronym":6,"body":7,"category":40,"description":41,"difficulty":42,"extension":43,"letter":44,"meta":45,"navigation":46,"path":47,"related":48,"seo":56,"sitemap":57,"stem":60,"subcategory":6,"__hash__":61},"terms\u002Fterms\u002Fs\u002Fsite-reliability-engineering.md","Site Reliability Engineering",null,{"type":8,"value":9,"toc":33},"minimark",[10,15,19,23,26,30],[11,12,14],"h2",{"id":13},"eli5-the-vibe-check","ELI5 — The Vibe Check",[16,17,18],"p",{},"Site Reliability Engineering is the discipline of making sure your app stays up and runs well, using the same rigorous engineering methods you'd use to write code. SREs don't just react to fires — they write code to prevent fires, measure how reliable the system is, and decide when it's worth adding new features vs fixing reliability problems.",[11,20,22],{"id":21},"real-talk","Real Talk",[16,24,25],{},"Site Reliability Engineering (SRE) treats operations as a software engineering problem. Core practices include defining Service Level Objectives (SLOs), managing error budgets, eliminating toil through automation, monitoring with Service Level Indicators (SLIs), and conducting blameless postmortems.",[11,27,29],{"id":28},"when-youll-hear-this","When You'll Hear This",[16,31,32],{},"\"Site Reliability Engineering gave us a framework for measuring and improving uptime.\" \u002F \"The SRE book by Google is the bible of modern ops.\"",{"title":34,"searchDepth":35,"depth":35,"links":36},"",2,[37,38,39],{"id":13,"depth":35,"text":14},{"id":21,"depth":35,"text":22},{"id":28,"depth":35,"text":29},"cicd","Site Reliability Engineering is the discipline of making sure your app stays up and runs well, using the same rigorous engineering methods you'd use to wri...","intermediate","md","s",{},true,"\u002Fterms\u002Fs\u002Fsite-reliability-engineering",[49,50,51,52,53,54,55],"SRE","SLA","SLO","SLI","Monitoring","Postmortem","DevOps",{"title":5,"description":41},{"changefreq":58,"priority":59},"weekly",0.7,"terms\u002Fs\u002Fsite-reliability-engineering","d4YUvVNoVFpXSz8B0LYVwBfjy_f1hBlQTB8ucvrK80s",[63,67,70,73,77,81],{"title":55,"path":64,"acronym":6,"category":40,"difficulty":65,"description":66},"\u002Fterms\u002Fd\u002Fdevops","beginner","DevOps is the culture and practice of tearing down the wall between the people who write code (Dev) and the people who run it in production (Ops).",{"title":53,"path":68,"acronym":6,"category":40,"difficulty":65,"description":69},"\u002Fterms\u002Fm\u002Fmonitoring","Monitoring is keeping a constant eye on your app while it runs — tracking whether it's up, how fast it responds, how many errors it throws, and how much me...",{"title":54,"path":71,"acronym":6,"category":40,"difficulty":42,"description":72},"\u002Fterms\u002Fp\u002Fpostmortem","A Postmortem is the meeting you have after an incident to figure out what went wrong and how to prevent it from happening again.",{"title":50,"path":74,"acronym":75,"category":40,"difficulty":42,"description":76},"\u002Fterms\u002Fs\u002Fsla","Service Level Agreement","An SLA is a contract between you and your users about how reliable your service will be. 'We promise the app will be up 99.9% of the time.",{"title":52,"path":78,"acronym":79,"category":40,"difficulty":42,"description":80},"\u002Fterms\u002Fs\u002Fsli","Service Level Indicator","An SLI is the actual measurement you track to know if you're hitting your SLO. If the SLO says 'be fast,' the SLI is the actual timer measuring speed.",{"title":51,"path":82,"acronym":83,"category":40,"difficulty":42,"description":84},"\u002Fterms\u002Fs\u002Fslo","Service Level Objective","An SLO is the internal target you set for how well your service should run. It's like a personal fitness goal vs a race you signed up for.",1776518313500]