I was in the Emergency Department, trying to be grateful rather than afraid. "If the doctor can't save my leg", I reasoned, "at least I can still code."
Thankfully, the medicine worked and I returned to work with both legs intact. My first project after recovering was to establish a bug management strategy and process. We were rapidly acquiring users and... bug reports. We had to rethink our approach, to ensure bugfixes did not distract from the main goal of feature delivery.
Well, what great timing! I had just observed world class bug management (of the biological kind). In a hospital, the stakes are high and there is little room for error. I was curious, could the engineering team borrow an idea from the hospital?
Immediately triage bugs
When I arrived at the Emergency Department, I was assessed by a triage nurse. With a few simple and unambiguous rules, the nurse determined I was a low priority patient. I had a normal temperature, and a pulse, which was a great deal better than some of the other patients that afternoon.
I like the idea of simple rules for repetitive tasks. Rules can save time by eliminating decisions, and creates consistency. Back in the office, the engineering team drew up a bug triage chart to help us decide:
- Priority - whether the bugfix was more important than a feature ticket
- Timing - the due date of the bugfix
Here is an excerpt of our chart. I can only share a portion of it here, as to not disclose confidential information.
Bug triage chart excerpt
|Priority||Description||Expected completion time|
|Red Alert||Significant business impact. A significant bug or error, which has caused the application to fail.||As soon as possible. All engineers to immediately drop everything to fix the issue.|
|High Priority||Urgent fix required to meet SLA.||One engineer will be assigned to fix the issue within the SLA period.|
|Medium Priority||Bug impacting user experience, performance, or site reliability.||One engineer will be assigned to fix the issue in the next sprint.|
|Not Fixing||This issue will not be fixed.||This issue will not be fixed. We may revisit the issue at a later date, but no date is planned.|
I particularly like the "Not Fixing" category. It's important to remember that there is always the choice to not do something.
Every bug is triaged on arrival. The priority and expected fix time is communicated to the engineering team, as well as any other stakeholders. With a clear set of rules, any engineer is empowered to make the assessment.
The change was noticeable immediately. We knew how long to spend fixing bugs, how many engineers to assign to the issue, and when bugfixes would be delivered. With a few quick rules, the engineering team had much more confidence about bug management.