Concept

Conditional Probability

Definition

Conditional probability is the probability of an event A on the assumption that another event B has occurred, written P(A|B) and defined as P(A and B) divided by P(B), provided P(B) is greater than zero. It restricts the sample space to outcomes in which B holds and asks what fraction of those also satisfy A.

Conditioning is how probability updates in response to information. Every act of learning — observing a test result, hearing a witness, drawing a ball from an urn, watching a roulette wheel spin — corresponds mathematically to conditioning the probability distribution on what was learned. The probability "of A" is not really a single number attached to A in isolation; it is always a number attached to A relative to some background context, and changing that context changes the number.

Why it matters

How it works

From counting to conditioning

The starting picture of probability is counting: list the equally likely outcomes, count those in which A holds, divide. Conditional probability is a small but decisive shift of frame. P(A) is what you assign before observing anything; P(A|B) is what you assign after learning that B holds. Mechanically you simply restrict the sample space to B and recount: how many of the B-outcomes also satisfy A? Equivalently, divide the joint probability P(A and B) by the marginal P(B). Haigh's ten-ball urn makes this concrete — the unconditional probability of drawing a Green ball is 5/10, but if you are told the ball is Low (numbers 0–4) and four of the five Low balls happen to be Green, P(Green | Low) jumps to 4/5. The information has narrowed the sample space and the number moves with it.

Independence as the special case of "no update"

Two events are independent when conditioning on one tells you nothing new about the other: P(A|B) = P(A), and equivalently P(A and B) = P(A) times P(B). This is why the Multiplication Law is so seductive — it is just one multiplication away — and why it is so frequently misapplied. Haigh's female-engineer example is blunt: half the students are female and one in five study engineering, so the naive product 1/10 suggests female engineers should be one in ten. They are far rarer than that, because being female and studying engineering are not independent in the real student population. Pretending they are erases a real correlation and produces numbers that can be off by an order of magnitude. The same reflex, multiplied across courtroom evidence or insurance pricing, has produced wrongful convictions and bad policy.

Inductive validity, defined in conditional terms

Logic textbooks usually distinguish deduction (the premisses guarantee the conclusion) from induction (the premisses make the conclusion likely). Priest's move in Logic: A Very Short Introduction is to make the second precise: an inductive argument is valid just in case the conditional probability of the conclusion given the premisses exceeds that of its negation given the premisses — that is, P(conclusion | premisses) is greater than P(not conclusion | premisses). Sherlock Holmes's "deductions" then come into focus as inductions of exactly this shape. A worn cuff does not guarantee that Wilson writes a lot, but among Londoners with that pattern of cuff-wear most are clerks, so P(writes a lot | cuff pattern) is comfortably above one-half. Conditional probability is, in this reading, the very thing that makes inductive reasoning measurable.

Inverting the conditional: Bayes' theorem

P(A|B) and P(B|A) are different numbers and often vastly different. P(Australia | wild kangaroo nearby) is close to certain; P(wild kangaroo nearby | Australia) is small. The two are linked by Bayes' theorem, derived in two lines from the definition by noting that "A and B" is the same event as "B and A": P(A|B) equals P(B|A) times P(A) divided by P(B). The theorem is the bookkeeping device that converts what a hypothesis predicts about evidence — P(evidence | hypothesis) — into what the evidence says about the hypothesis — P(hypothesis | evidence). The first is usually easy to write down; the second is what you actually want, and Bayes is how you get there.

Priors do the heavy lifting

A consequence of Bayes' theorem that surprises people on first contact: the prior probability of the hypothesis is not optional. Priest uses this to dismantle the Argument to Design. The argument is right that P(ordered cosmos | designer) far exceeds P(ordered cosmos | no designer), so the evidence does favour the designer hypothesis over its negation. But what the argument needs is P(designer | ordered cosmos) to exceed P(no designer | ordered cosmos), and Bayes' theorem shows that survives only if the prior P(designer) is already at least as large as P(no designer) — and the whole point of the argument is that there is no independent reason to think so. The same lesson generalises: when priors can be defended (two roulette wheels, no reason to favour either as the biased one, prior of 1/2 each), Bayesian updating from evidence is powerful and uncontroversial; when priors are smuggled in, the appearance of evidence-driven conclusions is an illusion.

Base rates and the prosecutor's fallacy

The most consequential application of the P(A|B) versus P(B|A) distinction is in test interpretation. A disease affects 1% of the population and a test is, in the standard caricature, "95% accurate". The probability that a person who tests positive actually has the disease is nowhere near 95% — it is more like 16%, because P(positive | sick) is not the same as P(sick | positive) and the gap is governed by the base rate P(sick). In criminal courtrooms the same swap is the prosecutor's fallacy: the probability of the evidence given innocence is presented as if it were the probability of innocence given the evidence. Conditional probability is what makes these errors visible. Without it the numbers look authoritative; with it, the calculation is mechanical.

The reference-class problem

There is a deeper question hiding under every conditional probability: P(A|B) on what class of cases? Frequency probabilities live inside a reference class, and the same event can sit in many. The probability that Wilson writes a lot given his worn cuff is computed relative to "Londoners with that cuff pattern"; the same person also belongs to "lawyers", "Yorkshiremen", "former soldiers", and so on, and each class can deliver a different number. The mathematics of conditioning does not tell you which class to pick. This is the reference-class problem, and it is the soft underbelly of every confident-sounding conditional probability — including the ones that get cited in courtrooms and clinics.

Where it goes next

Continue exploring

Tags