How we handle post incident reviews

Post-incident reviews turn failures into system improvements. They focus on learning and ownership, not blame.

Document the incident

For each incident we capture a postmortem with timeline, customer impact, contributing factors, mitigations, and explicit follow-ups. Use a single template—wiki page, GitHub Issue, or shared doc—so history stays searchable.

Run review meeting

Held within a few days after resolution, open cross-functionally, and guided by the post-mortem document.

Keep it blameless

Ask "how did this happen?" rather than "who caused this?". The goal is better systems, tooling, and assumptions.

Take ownership

Follow-up actions have owners, deadlines, and tracking in GitHub Issues (or the team’s chosen tool); nothing important lives only in chat.

SLA expectations

Urgent: 3 days
High: 2 weeks
Medium/Low: 1 month

Follow-ups take priority over regular feature work.