2 The workings of probability

8 min read

Aside, Card, CardGrid, LinkCard, Steps, Tabs, TabItem, Badge, } from '@astrojs/starlight/components';

Core idea

Probability is not a single number attached to an event in isolation. It is a number attached to an event in a context — and once the context shifts, the number shifts with it. Haigh's topic is a tour of the four working tools that let you keep track of those shifts: the Addition Law, the Multiplication Law, independence, and the idea of a conditional probability. Together they explain how probabilities compose as new information arrives.

The most important move in the topic is the realisation that "the probability of B" and "the probability of B given that A happened" are different objects. Confusing the two — or, worse, treating them as the same number — is the source of nearly every famous probability paradox, from the Monty Hall problem to the false-positive medical test. Bayes' theorem is the bookkeeping device that converts one into the other.

Author's framing: Assuming two factors to be independent, when they are not, is one of the most common mistakes made in assessing probabilities.

Why it matters

Almost every consequential use of probability — medical screening, criminal forensics, spam filtering, insurance pricing, machine learning — turns on conditional probability rather than raw probability. A test that is "99% accurate" sounds decisive until you ask: 99% conditional on what? Conditional on the patient being sick, or conditional on the test reading positive? Those two numbers can differ by an order of magnitude, and the difference is the whole game.

From counting to updating

Topic 1 built probability out of counting equally likely outcomes. This topic shifts the frame: probability is what you have before you observe anything, and a conditional probability is what you have after. The Multiplication Law and Bayes' theorem are the rules that let you go from the first to the second without contradiction.

The cost of treating dependent events as independent

Multiplying two probabilities together is so easy that it gets done reflexively, even in cases where the events are clearly linked. The cost of that reflex — in courtrooms, in clinics, in policy — has been measured in wrongful convictions and bad decisions. Haigh's example of the female engineer makes the point bluntly: half the students are female, one in five study engineering, but female engineers are much rarer than the product 1/10 would suggest. The two attributes are not independent, and pretending otherwise hides the fact.

Key takeaways

Mental model — the base-rate problem as a flowchart

Mental model — the base-rate problem as a flowchart

Conditional probability

Probability lives inside a context

Write P(A) for the probability of event A and P(A | B) for the probability of A given that B has occurred. The bar reads "given". P(A | B) is computed by restricting attention to the cases where B happens, then asking what fraction of those also feature A. It is a re-counting exercise on a smaller universe.

In Haigh's ten-ball example, the unconditional probability of drawing a Green ball is 5/10. If you are first told the ball is Low (numbers 0–4), the conditional probability of Green changes — four of the five Low balls are Green, so P(Green | Low) = 4/5. New information has narrowed the sample space.

The chain rule for probabilities

The Multiplication Law is the chain rule:

P(A and B) = P(A) × P(B | A)

In words: to get the chance that both A and B happen, multiply the chance that A happens by the chance that B happens after you know A has happened. The law extends to longer chains: P(A and B and C) = P(A) × P(B | A) × P(C | A and B), and so on for any number of events. This composition rule is how complex probabilities are built out of simpler conditional building blocks.

Independence

What independence actually says

Two events A and B are independent when learning one tells you nothing about the other — formally, P(B | A) = P(B). If that holds, the chain rule collapses to a simple product: P(A and B) = P(A) × P(B). This is the only case where you can multiply unconditional probabilities together without thinking further.

Independence is symmetric: if A says nothing about B, then B says nothing about A. So independence is a property of the pair, not a direction of inference.

Independence is not obvious

Some pairs of events are clearly independent — rain in Tunis today and the gender of the next birth in Paris. Some are clearly dependent — being a smoker and getting lung cancer. The dangerous middle ground is the pairs that look unrelated but aren't. Haigh's die example is instructive: on a six-sided die, "even" and "multiple of three" are independent; on an eight-sided die they are not; on a ten-sided die they are independent again. Whether two events on a numbered die are independent depends on how many sides the die has — pure arithmetic decides, and intuition has nothing useful to add.

The female-engineer trap

The single most common probability mistake is multiplying when you should be conditioning. If 50% of grad students are female and 20% study engineering, the product 10% is only the probability of "female and engineer" when gender and field of study are independent. They are not. The actual fraction of female engineers in most grad schools is well below 10% — the two attributes are correlated, and the chain rule forces you to use P(female | engineer) or P(engineer | female), not the marginal.

Bayes' theorem

Flipping the conditional

You usually know P(evidence | hypothesis) — for instance, the chance a test reads positive given that the patient is sick — but what you actually want is P(hypothesis | evidence): the chance the patient is sick given that the test read positive. Bayes' theorem is the algebra that swaps the two:

P(H | E) = P(E | H) × P(H) / P(E)

Three ingredients: the prior P(H) (what you believed before the evidence), the likelihood P(E | H) (how the evidence behaves under the hypothesis), and the marginal P(E) (how often the evidence shows up overall). The marginal is just the sum P(E | H) × P(H) + P(E | not H) × P(not H) — the total weight of the evidence across all hypotheses.

Why the flip is so disorienting

People treat P(E | H) and P(H | E) as interchangeable because the English sentences sound similar. They are almost never the same number, and they often differ by a factor of 10 or more. Recognising that they are different objects is the entire conceptual lift of Bayesian reasoning.

Base rates

The base rate is the prior

In Bayes' formula, the prior P(H) is the base rate — how common the hypothesis is before any evidence is gathered. Low base rates have a tyrannical effect: when a hypothesis is rare, even a strong piece of evidence cannot pull the posterior probability very far up. This is why screening healthy populations for rare diseases produces mostly false positives, and why "but the test is 99% accurate!" misses the point.

The 1% disease, the 99% test

The diagram above showed the arithmetic. Re-stating it in Bayes' notation: with prevalence P(Sick) = 0.01 and a test with sensitivity P(+ | Sick) = 0.99 and false-positive rate P(+ | Well) = 0.01,

  • P(+) = 0.99 × 0.01 + 0.01 × 0.99 = 0.0198
  • P(Sick | +) = (0.99 × 0.01) / 0.0198 = 0.50

A positive result from a "99% accurate" test means a 50% chance of being sick. Drop the prevalence to 0.1% and the posterior collapses to under 10%. The base rate, not the test accuracy, dominates the answer.

Practical application

  1. Write down the question as a conditional probability. Identify the hypothesis H and the evidence E. Ask which direction the conditional runs — is it P(E | H) or P(H | E)?

  2. Find the base rate P(H). Before the evidence, how common is the hypothesis in the relevant population? If you can't pin this down, your final answer cannot be trusted.

  3. Identify the likelihoods. P(E | H) is usually given (test sensitivity). P(E | not H) is the false-positive rate — easy to overlook, but essential.

  4. Compute the marginal P(E). Use the law of total probability: P(E) = P(E | H) × P(H) + P(E | not H) × P(not H).

  5. Apply Bayes. P(H | E) = P(E | H) × P(H) / P(E). Sanity-check the order of magnitude against the base rate — the posterior should move in the right direction but rarely as far as instinct suggests.

  6. Check the independence assumption. If you multiplied any probabilities along the way, ask whether the events really are independent. If not, replace the product with the chain rule.

Example: a courtroom DNA match

A defendant's DNA matches a sample from a crime scene. The forensic lab reports a "one in a million" match probability — meaning P(match | innocent) = 0.000001. The prosecutor concludes the defendant is therefore one-in-a-million likely to be innocent. This is the prosecutor's fallacy: confusing P(match | innocent) with P(innocent | match).

Apply Bayes. Suppose the police drew the suspect from a pool of 10 million people who could plausibly have been at the scene. The prior P(guilty) is 1 / 10,000,000. The likelihood P(match | guilty) is essentially 1. The marginal P(match) = 1 × (1/10,000,000) + 0.000001 × (9,999,999/10,000,000) ≈ 1.1 × 10⁻⁶.

P(guilty | match) = (1 × 10⁻⁷) / (1.1 × 10⁻⁶) ≈ 9%.

A "one in a million" match, applied to a wide suspect pool, leaves a 91% chance the defendant is innocent — not the reverse. The base rate of guilt, not the match probability, drives the answer. Cases have been overturned on exactly this confusion.

Continue exploring

Tags