The Checklist Factory

7 min read

Core idea

After his first surgical checklist collapsed inside a single afternoon, Gawande does what he should have done first: he goes to where checklists have been engineered for seventy years. He visits Daniel Boorman at Boeing, the lineal descendant of the test pilots who built the original B-17 list. Boorman runs what amounts to a checklist factory — Boeing produces or revises more than a hundred each year — and he hands Gawande a working theory of design. Good checklists are not comprehensive how-to guides. They are tight, deliberately incomplete, expert-supporting tools. The topic turns the abstract framework into a concrete design discipline: define a clear pause-point; pick DO-CONFIRM or READ-DO; keep it to roughly five-to-nine items and 60–90 seconds; focus on the killer items; use plain language and clean typography; and — non-negotiable — test it in the real environment, find where it falls apart, fix it, test again.

Why it matters

Bad checklists vs. good checklists

Boorman is blunt about the failure modes. Bad checklists are vague and imprecise; too long; impractical; written by desk jockeys with no first-hand sense of where they will be used; treat experts as if they were idiots; spell out every step. They turn brains off. Good checklists are precise, efficient, easy to use under load. They do not try to spell out everything — a checklist cannot fly a plane. They list the most critical and easily-missed steps, the ones that even skilled professionals miss, and stop there. They are practical.

Boorman's guiding constraint: Checklists are quick, simple tools aimed to buttress the skills of expert professionals — and by remaining swift and usable and resolutely modest, they are saving thousands of lives.

DO-CONFIRM vs. READ-DO

The first decision in designing any checklist is which type it is.

  • DO-CONFIRM — the team members perform the steps from memory and experience, often separately, then pause to confirm together that everything got done. Used when the steps are familiar and the value is verification.
  • READ-DO — the team reads each step and performs it before reading the next, recipe-fashion. Used when the steps are unfamiliar or order-critical, especially in emergencies.

Most surgical safety steps before incision are DO-CONFIRM; emergency procedures (e.g. the DOOR FWD CARGO list) are READ-DO.

Pause-points and time budgets

A checklist needs a clearly defined moment to be run — a pause-point. Sometimes the moment is obvious (a warning light fires, an engine quits). For routine pre-action checks, the pause-point must be picked deliberately and trained into the workflow.

The time budget is tight. Boorman's working number: more than 60–90 seconds at a pause-point and the team starts shortcutting. The rule of thumb of five to nine items tracks working-memory limits — though Boorman is undogmatic about the exact count and clear that context decides.

Killer items

The discipline of trimming is the discipline of identifying killer items — the steps that, when skipped, are most likely to cause catastrophic harm and are most often missed. Data on which steps are most lethal and most missed is "highly coveted in aviation, though not always available." When you cannot get such data, you fall back on field experience and informed estimate — and you iterate.

The DOOR FWD CARGO case

Boorman's worked example is the 1989 United 811 disaster. An electrical short let a 747's forward cargo door unlatch above 22,000 feet. The pressure differential blew the door off in 1.5 seconds, taking five rows of business-class seats with it; nine passengers were lost. Boeing's response was both mechanical (extra latches) and procedural (a checklist for what to do when the warning light fires). The checklist is not to "vent the cabin and depressurise immediately" — that would expose passengers to Everest-altitude oxygen levels. It is to make a controlled descent to ~8,000 feet first, then vent. The point Boorman makes is that even the "right answer" is not obvious from instinct — and that in the cockpit recorder transcript, two pilots in genuine catastrophe stopped and read the checklist.

Cosmetic details matter

The topic is unexpectedly granular: ideally the list fits on one page; uses both uppercase and lowercase (easier to read); is free of clutter and unnecessary colour; uses a sans-serif font (Boorman recommends Helvetica). These are not aesthetic preferences. They are usability details that change whether the list gets used at 3 a.m. with adrenaline up.

British Airways 38 → Delta over Montana

The topic ends with the strongest case yet for the checklist as a knowledge-translation engine. British Airways 38 fell out of the sky two miles short of Heathrow in January 2008; eight months of investigation produced a tentative theory (ice crystals accumulating in unusually smooth polar fuel flow) and an FAA bulletin. Two weeks later, Boorman's team had distilled that bulletin into a revised polar-flight checklist; within a month it was in every Boeing 777 cockpit. In November 2008, a Delta flight over Great Falls, Montana lost an engine to the same icing problem at 39,000 feet — and the pilots followed the checklist, the engine recovered, the 247 passengers did not even notice. One study found that, in medicine, it takes an average of seventeen years for a new treatment to reach half of the patients who would benefit. Aviation got the icing fix into every cockpit in roughly a month. That gap, Gawande argues, is the prize.

Key takeaways

Mental model

Mental model

Practical application

1. Be explicit about the type

Before writing a single line, decide: is this a DO-CONFIRM (verification after the work, e.g. pre-launch readiness) or a READ-DO (instruction during the work, e.g. an incident-response runbook)? Mixing the two is the most common design failure outside surgery and aviation.

2. Pick the pause-point first

A list with no defined moment to run will not run. Identify the natural break in the workflow — before the deploy starts, before the patient is wheeled in, after the door has shut — and tie the list to it. If no break exists, design one in (e.g. a mandatory five-minute "go/no-go" call).

3. Budget 60 seconds and the rule of seven

Aim for fewer than 9 items. Aim to finish in under 90 seconds. If you cannot, split the list into multiple pause-points (Boorman's surgical list has three: before anaesthesia, before incision, before the patient leaves the room) rather than asking a team to run a 25-item list at once.

4. Hunt for killer items, not for completeness

Ask: of the failures we have actually seen, which steps were missed? Which steps are catastrophic when missed and historically missed? Those go on. The rest stay off.

5. Treat the first draft as a test instrument

Write the version you think will work and then immediately test it in the real environment. Watch for the moments the team gets confused, shortcuts, or quietly abandons the list. Those are the design defects to fix in v2. Boeing iterated each checklist in simulators with pilots before release; the surgical equivalent is a half-day in a single OR.

Example

A platform team writes an incident-response checklist for "service is down and the cause is not obvious." The first draft is 23 items long and lives in a wiki. It is never used during real incidents. Following Boorman's discipline, they redesign it. The pause-point becomes explicit — when the incident is declared a sev-1. The type is READ-DO (unfamiliar steps under stress). The list is trimmed to nine items focused on the killer misses from their last twenty incidents: (1) confirm the page reached the right on-call; (2) declare incident commander; (3) open the bridge; (4) snapshot the database; (5) freeze deploys; (6) post first customer message; (7) start a timeline doc; (8) ping security; (9) ping legal if customer data exposed. One page. Sans-serif. Posted in the incident bridge channel as a pinned message. They run it in a tabletop exercise; two items are ambiguous; they fix them. The next real sev-1, the on-call follows the list and the customer message goes out in eight minutes instead of forty.

Continue exploring

Tags