Chaos Testing
ELI5 — The Vibe Check
Chaos testing is intentionally breaking things in production to see if your system can handle it. Kill a server. Slow the network. Corrupt a database connection. If your system survives, great. If not, you found a problem before your users did. Netflix invented this with Chaos Monkey.
Real Talk
Chaos testing (chaos engineering) is the discipline of experimenting on distributed systems by introducing controlled failures to build confidence in system resilience. Principles include defining steady state, hypothesizing about failure impact, injecting faults (network latency, server crashes, disk failures), and observing system behavior. Tools include Chaos Monkey, Litmus, Gremlin, and Toxiproxy.
When You'll Hear This
"Chaos testing revealed our retry logic doesn't back off, causing cascading failures." / "We run chaos experiments monthly to validate our disaster recovery."
Related Terms
Chaos Engineering
Imagine stress-testing a bridge by parking trucks on it before opening day instead of hoping it holds.
Circuit Breaker
Circuit Breaker is like the electrical circuit breaker in your house.
Fault Injection
Fault injection is deliberately adding errors to your system to test how it responds. Add 500ms latency to database calls.