9 Curiosities and dilemmas
9 min read
Aside, Card, CardGrid, LinkCard, Steps, Tabs, TabItem, Badge, } from '@astrojs/starlight/components';
Core idea
Probability is not paradoxical, but human intuition about it is. Every case in this topic — the Monty Hall problem, the birthday paradox, the gambler's fallacy, Simpson's paradox, the prosecutor's fallacy, the boy-or-girl puzzle, the two-envelopes dilemma — looks contradictory only because the naive mental model quietly substitutes the wrong sample space or conditions on the wrong event. Once you write the events down carefully, count outcomes by equally likely cases, and apply Bayes' rule where conditioning is involved, the paradox dissolves.
Author's framing: "The subject of probability is wholly free from real paradoxes." The puzzles in this topic are diagnostic of how informal reasoning misroutes evidence — they are tools for sharpening intuition, not curiosities to be marvelled at.
Why it matters
These aren't recreational puzzles. The same mistakes appear in courtrooms, hospitals, university admissions, and DNA labs. A jury that misreads a DNA match probability convicts the wrong person. A hospital that compares headline cure rates without stratifying by case severity rewards the wrong surgeon. A regulator who treats a string of losses as "due for a correction" misprices risk. The topic's puzzles are clinical exercises that train you to spot the structural error before it costs real money or freedom.
Two recurring mistakes
Almost every paradox in this topic collapses to one of two underlying errors. The first is choosing the wrong reference class — counting outcomes uniformly when the act of observation has already filtered them. The second is confusing conditional directions — answering "given the evidence, how likely is innocence?" by computing instead "given innocence, how likely is the evidence?". Both errors are invisible to common sense; both yield to careful enumeration.
Key takeaways
Mental model — the Monty Hall decision tree
When intuition under-counts: the birthday paradox
How many people must be in a room before two share a birthday with probability above 1/2? The intuitive guess is somewhere near half of 365 — perhaps 180. The actual answer is 23. The mistake is to imagine matching each person against a fixed target; the correct framing is to count pairs. With 23 people there are 23 × 22 / 2 = 253 distinct pairs, and each pair has a 1/365 chance of coinciding. The probability that all pairs miss is (364/365)^253, which already falls below 1/2.
Why the surprise is structural
Linear intuition tracks the count of people; the quadratic count of pairs is what actually drives matches. The same shape governs network collisions, hash-table birthday attacks in cryptography, and the question of how many independent trials before a coincidence becomes likely. Whenever a problem asks "what is the chance any two of these items collide?", expect the threshold to be far smaller than the linear estimate.
The gambler's fallacy and the hot hand
A fair coin lands heads ten times in a row. What's the chance of heads on the next toss? Still 1/2. The fallacy is to think the coin "owes" tails — that the universe runs a balancing mechanism. Independent trials have no memory. The long-run frequency converges to 1/2 not because deviations are corrected but because they are diluted by an ever-growing denominator.
The hot hand — the same mistake inverted
The gambler's fallacy says streaks must end. The hot-hand fallacy says streaks must continue. Both impose narrative on a process that has none. If a basketball player's shots are independent with success rate 50%, a run of five makes in a row is exactly what one expects to see somewhere in a season; we just notice it more vividly than the long arithmetic of misses. The honest test is whether shot success rates after a streak differ from the player's baseline, and the empirical answer for most sports is: barely, if at all.
Simpson's paradox
Suppose 480 of 1,000 male applicants and 240 of 1,000 female applicants are admitted to a university. The headline rate looks damning — men are twice as likely to be admitted. Now stratify by department. In the English department, 10% of 50 men get in versus 20% of 950 women. In Business Studies, half of 950 men get in versus all 50 women. In both departments, women are admitted at twice the rate of men. The aggregate gap exists only because women applied disproportionately to departments with lower admit rates overall.
The Berkeley case — the real example
This isn't hypothetical. In 1973, Berkeley's graduate school admitted 44% of male applicants but only 35% of female applicants. The university feared a discrimination suit. When statisticians broke admissions down by department, the per-department rates either showed no difference or slightly favoured women. The aggregate gap was an artefact of women applying more heavily to the most competitive departments.
The general lesson
Whenever you read a comparison of proportions, ask: is there a hidden category that varies systematically between the two groups? If yes, the headline proportion can point the opposite way from every stratified comparison. This is not statistical sleight of hand — it is what happens when you average rates over groups of unequal size.
The prosecutor's fallacy
A DNA sample from a crime scene matches a suspect; the probability of a random innocent person matching is 1 in a million. Therefore the suspect's probability of innocence is 1 in a million. No. That argument confuses two distinct conditional probabilities:
- P(match | innocent) — the false-positive rate of the test.
- P(innocent | match) — the quantity the court actually cares about.
How they diverge in practice
Suppose the suspect was identified by trawling a national DNA database of 60 million people. Out of those, roughly 60 will match by pure chance even if none is the criminal. The prior probability that any given matching person is the actual perpetrator is therefore not "1 minus 1 in a million" — it is closer to 1 in 60. A test that sounds devastating in the abstract becomes far less so once the prior population is correctly framed. The match still matters as evidence, but it does not by itself establish guilt to anything like the certainty the headline number suggests.
The same fallacy underlies headlines about rare disease screens. A test that is "99% accurate" applied to a disease with prevalence 1 in 10,000 will, in the population that tests positive, identify mostly false alarms. Bayes' rule, not the test accuracy, governs interpretation.
The boy-or-girl paradox
A friend says: "I have two children. At least one is a boy." What is the probability that both are boys?
The naive answer is 1/2 — once we know one is a boy, surely the other is independently equally likely to be either sex. The careful answer is 1/3. List the four equally likely families (older, younger): BB, BG, GB, GG. The condition "at least one is a boy" eliminates GG, leaving three equally likely cases of which only BB is two boys.
The wording is everything
Change the condition to "the elder child is a boy" and the answer flips back to 1/2 — BB and BG are now the only survivors. Change it again to "I have two children; one of them, Alex, is a boy" and you get yet another answer (close to 1/2), because naming creates a stronger filter than just "at least one". Each version conditions on a different event in the sample space, even though the English sentences sound interchangeable. The paradox is a linguistic trap, not a mathematical one.
The two-envelopes problem
You are handed one of two sealed envelopes. One contains twice as much money as the other, but you don't know which. You open envelope A and see X dollars. Should you switch to envelope B? A seductive calculation says yes: B contains either 2X (with probability 1/2) or X/2 (with probability 1/2), giving an expected value of 1.25X — larger than X. By the same logic, after switching, you should switch back.
Where the argument fails
The calculation treats the two outcomes "B has 2X" and "B has X/2" as equally likely conditional on X, but that is not true in general. The probability that B is the larger envelope depends on what the underlying prior over amounts looks like. Without specifying that prior, the expected-value calculation is ill-defined — you cannot integrate over an "all possible amounts" distribution that is uniform on the positive reals (no such distribution exists).
The paradox is genuinely diagnostic: it shows that applying expected-value reasoning without a prior is not actually applying expected-value reasoning at all. When the prior is specified — for example, the smaller envelope is uniformly distributed between £1 and £100 — the apparent symmetry breaks and the calculation gives a sensible answer that depends on the observed X.
Practical application
-
Write the sample space. Before reasoning verbally, list the equally likely outcomes — families of two children, doors with cars, applicants by department. Verbal arguments that skip this step are where intuition leaks in.
-
Identify the conditioning event. What information has the puzzle given you? "At least one boy" filters the sample space differently from "the elder is a boy". Make the filter explicit.
-
Count survivors, not survivors-of-survivors. Apply the condition to the sample space once. Count outcomes that satisfy it. Within those, count the favourable cases. The ratio is the answer.
-
Check the direction of conditioning. P(A | B) and P(B | A) are different numbers. The prosecutor's fallacy is the canonical case of swapping them. Write down which one the question asks for.
-
Stratify before aggregating. When comparing proportions, ask whether a third variable correlates with both group membership and outcome. If yes, Simpson's paradox is on the table.
Example: a medical screening sanity check
A clinic offers a screening test for a rare condition with prevalence 1 in 10,000. The test has sensitivity 99% (it catches 99% of true cases) and specificity 99% (it correctly clears 99% of healthy people). You test positive. What is the probability you actually have the condition?
Naive intuition: about 99%. Careful counting: imagine 1,000,000 people. About 100 have the condition; 99 of them test positive. About 999,900 don't have it; 1% of those — about 9,999 — also test positive. Of the roughly 10,098 positive tests, only 99 are real — under 1%. The test is informative (your post-test odds rose from 1-in-10,000 to roughly 1-in-100), but the headline accuracy figure overstates what a single positive result tells you. This is the prosecutor's fallacy in a medical coat.
The lesson generalises. Whenever you see a sensitivity or specificity number quoted alone, mentally pair it with the base rate before interpreting any individual result. The base rate is what does most of the work.
Caveats
The topic's final note is ethical rather than mathematical. Once you can compute a probability for, say, the chance of developing a hereditary disease, you face the question of whether you want to know. Some Nobel laureates have refused to learn their own APOE genotype because no treatment exists for the disease the gene predicts. Probability gives you sharper questions; it does not always give you happier ones.
Related material
Related concepts
- Monty Hall Problemlinked concept
- Birthday Paradoxlinked concept
- Gambler's Fallacylinked concept
- Simpson's Paradoxlinked concept
- Prosecutor's Fallacylinked concept