How we handle post incident reviews
Post-incident reviews turn failures into system improvements. They focus on learning and ownership, not blame.
Document the incident
For each incident we capture a postmortem with timeline, customer impact, contributing factors, mitigations, and explicit follow-ups. Use a single template—wiki page, GitHub Issue, or shared doc—so history stays searchable.
Run review meeting
Held within a few days after resolution, open cross-functionally, and guided by the post-mortem document.
Keep it blameless
Ask "how did this happen?" rather than "who caused this?". The goal is better systems, tooling, and assumptions.
Take ownership
Follow-up actions have owners, deadlines, and tracking in GitHub Issues (or the team’s chosen tool); nothing important lives only in chat.
SLA expectations
- Urgent: 3 days
- High: 2 weeks
- Medium/Low: 1 month
Follow-ups take priority over regular feature work.