The Test
6 min read
Core idea
This is the topic where the engineered checklist meets reality. Gawande's team rebuilds the WHO list along Boeing lines — DO-CONFIRM format, three explicit pause-points, the circulating nurse (not the surgeon) opens it, no written tick-marks, every line timed against a wall clock. Then they test it in eight hospitals chosen for extreme diversity: Auckland, Seattle, Toronto, London on one end; Manila, Amman, Delhi, and Ifakara (a rural Tanzanian hospital 200 miles of dirt road from Dar es Salaam) on the other. After three months of use, major complications fall by 36 percent. Deaths fall by 47 percent. Infections fall by almost half. Returns to the OR for bleeding drop by a quarter. The result is statistically highly significant, holds in both wealthy and poor sites, and cannot be explained by case mix or observer effect. Across the staff who used it, 93 percent say they would want the checklist used in their own operation.
Why it matters
The final form: 19 items, 3 pause-points, ~2 minutes
The published WHO Surgical Safety Checklist crystallises everything the book has built toward.
- Before anaesthesia (7 checks). Patient identity and consent verified; surgical site marked; pulse oximeter on and working; allergies known; airway risk assessed and equipment ready; high-blood-loss risk anticipated with IV access and blood ready.
- Before incision (7 checks). Everyone introduces themselves by name and role; correct patient, site, and procedure confirmed by all; antibiotic given on time or judged unnecessary; required imaging displayed; surgeon briefs duration / blood loss / concerns; anaesthesia briefs plan and concerns; nursing briefs equipment, sterility, and patient concerns.
- Before the patient leaves the room (5 checks). Procedure name recorded correctly; specimens labelled; needles, sponges, and instruments accounted for; equipment problems flagged for next case; team reviews recovery plan together.
Design decisions that mattered
Several debates inside the working group shaped the final list. The circulating nurse opens it — not the surgeon — to disperse responsibility, following aviation's "pilot not flying calls the checklist" principle. No written tick-marks — the list is a team conversation, not a record. Cut to the killer items even when the cut hurt: deep-vein-thrombosis prophylaxis was dropped because it didn't generalise across populations and cost; surgical-fire checks were dropped because fires, while terrifying, are statistically rare compared to infections and bleeding. Communication checks stayed in even though they were not yet proven, because improving teamwork was judged fundamental.
What the baseline measurement found
The pre-checklist data Gawande's team gathered was sobering. Across the eight hospitals, of nearly 4,000 patients, 400+ developed major complications, 56 died. Complication rates ranged from 6 to 21 percent. The hospitals were missing at least one of six basic safety steps in two-thirds of patients on average — and even the best hospitals missed at least one in 1 in 16. "Surgery is risky and dangerous wherever it is done."
The result
After three months of checklist use:
- Major complications: 36 percent lower.
- Deaths: 47 percent lower.
- Infections: roughly halved.
- Return-to-OR for bleeding: down 25 percent.
- Across ~4,000 patients, 158 fewer serious complications and 27 fewer deaths than the baseline predicted.
The headline finding: This thing was real.
The result holds against every challenge
Gawande himself tried hard to break the result. Was the case mix easier in the second period? No — slightly more emergencies, same procedure mix. Was it a Hawthorne effect from observers being present? No — observers had been there before and after, the jump was at checklist introduction, and observed vs. unobserved operations improved identically. Was it just the poor sites improving from a low base? No — the high-income hospitals also saw a one-third drop in complications, also highly significant. In seven of eight sites it was a double-digit percent drop. The signal held everywhere.
What probably did the work
The single mechanical changes — better antibiotic timing, more oximeter use, fewer wrong-site mistakes — could not account for the size of the gain. Unrelated complications like bleeding also fell, and they had no targeted checklist item. The team's surmise is that improved communication was the mechanism. Staff surveys after the trial showed a significant increase in self-reported communication, and there was a strong correlation between improvement in teamwork score and reduction in complications. The introductions, the briefings, the licence to speak up — the parts surgeons most often resisted — were doing the heaviest lifting.
The 93 percent
The final, decisive piece of data was a single survey question put to the 250+ staff who had used the checklist for three months. About 20 percent still found the list awkward, slow, or unhelpful. But when asked, "If you were having an operation, would you want the checklist to be used?" — 93 percent said yes.
Key takeaways
Mental model
Practical application
1. Disperse the authority to start the list
The single most consequential design decision in the trial was that the circulating nurse — not the surgeon — calls the checklist. This was an explicit borrowing from "pilot not flying" aviation practice. In any organisation, decide deliberately who is empowered to bring the team to a pause. Default it to someone whose job is not the work the list is checking.
2. Anchor at natural commit-points
Pick pause-points at the moments the team is about to commit to something irreversible — anaesthesia, incision, leaving the room. Pause-points before commitment are when a wrong assumption is still cheap to fix.
3. Expect a learning curve, allow customisation
Each pilot site customised wording, translated terms, and re-ordered checks to fit local practice. The London team gave anaesthesia outside the theatre and had to shift the first pause-point accordingly. The Delhi team realised the standard pre-op antibiotic in the waiting area was wearing off by the time the patient reached the table; the checklist forced them to move the dose into the OR. Allow this kind of customisation explicitly; what you protect is the form (pause-points, team verbal protocol, killer items), not the exact wording.
4. Do not force the resistors
When some surgeons told the team to leave, the team left. Forcing adoption would have soured the larger group; convincing the willing majority and letting the data speak afterwards was the more durable strategy.
5. Measure what you cannot guess
The team measured baseline complications and safety-step compliance before introducing the list. Without that baseline, the result could not have been believed. If you cannot or will not measure before, the post-intervention data has no comparator.
Example
A financial-services compliance team runs a pilot of a 12-item pre-trade checklist across four desks chosen for diversity — two large institutional desks in New York and London, two smaller emerging-markets desks in Mumbai and Johannesburg. The checklist has three pause-points: before the trade is sized, before execution, before booking. They measure baseline operational-error rates and near-miss-reports for three months, then introduce the list. After three months they see a 30+ percent drop in operational errors across all four desks, with double-digit drops at each site. The single-item gains (e.g. counterparty-limit checks, dual-approval on size) do not account for the full reduction; survey data shows materially higher communication scores between trader, compliance, and ops. The mechanism is the same as Gawande's: targeted task-checks plus forced team conversation at commit-points. The bridging argument with reluctant senior traders is the same: "would you want the desk to use this if it were your own money on the trade?"
Related lessons
Related concepts
- WHO Surgical Safety Checklistlinked concept
- Pause-pointlinked concept
- Team Coordinationlinked concept
- Communication Checklistlinked concept