Resumable Workflow
ELI5 — The Vibe Check
A resumable workflow can pause, crash, and pick up exactly where it left off — checkpoints for distributed systems that can't afford to restart. Imagine a 3-hour data pipeline that crashes at step 47 of 50. Without resumability, you start over. With it, you pick up at step 47. It's like a video game save system for your backend processes. Non-negotiable for anything that touches money or takes hours to run.
Real Talk
Resumable workflows persist execution state at each step so the workflow engine can replay from the last checkpoint after a failure or pause. Frameworks like Temporal, Inngest, AWS Step Functions, and Cloudflare Workflows implement this via durable execution — the engine stores step results in a persistent event log and replays the workflow using that log. Idempotent steps are a prerequisite; non-idempotent side effects must be guarded with exactly-once semantics.
When You'll Hear This
"The import job uses a resumable workflow — if it crashes at 90%, it picks up from there." / "Temporal handles the resumable workflow layer so we don't have to build it."
Related Terms
Message Queue
A Message Queue is a waiting room for tasks. Producers drop tasks in the queue, consumers pick them up and process them one at a time.
Orchestration
Orchestration is the process of automatically managing, coordinating, and scheduling where your containers run.
Saga Pattern
The saga pattern is how you handle transactions that span multiple services. In a monolith, you'd wrap everything in one database transaction.