4 Chance experiments
8 min read
Core idea
A chance experiment is any procedure whose outcome cannot be predicted in advance — buying a lottery ticket, timing a road accident, counting wickets in an over. To reason about such an experiment quantitatively, you replace the bare outcome with a random variable (a numerical summary of what happened) and pair it with a probability distribution (the full list, or curve, of values it can take and how likely each is). The art of applied probability is choosing a distribution whose assumptions match the actual mechanism producing the randomness.
Author's framing: A distribution is the central object — it specifies all possible outcomes and their probabilities. Whether you can solve a problem hinges on whether your assumptions about which distribution applies are sound.
Why it matters
Once you know the distribution of a random variable, every interesting question — how likely is more than k?, what's the average?, how wide is the spread? — becomes mechanical. You don't have to reinvent reasoning for each new problem; you reach for the family whose conditions match, plug in two or three parameters, and read off the answer. Whole industries (insurance, quality control, epidemiology, particle physics) run on exactly four or five named distributions because the underlying mechanisms repeat even when the surface details don't.
What changes when you think in distributions
A novice thinks "what will happen?" — a single guess. A probabilist thinks "what is the shape of the cloud of things that could happen, and where does its centre and width sit?" The shift from point prediction to distribution-as-object is the same conceptual move that separates folk reasoning from statistical reasoning.
Where it breaks
Distributions are abstractions. They work when their conditions are met. Treating coin flips as if they were a normal distribution, or insisting a binomial fits when trials are not independent (the bridge hand) gives answers that are confidently wrong. The skill is recognising which idealisation is honest for the situation.
Key takeaways
Random variables and their distributions
A random variable is the bridge between a messy real-world outcome and a number you can compute with. Roll two dice and the outcome is a pair like (3, 5); the random variable sum of pips turns that into 8. Buy a lottery ticket and the outcome is a complicated configuration of balls; the random variable prize won in pounds maps it to 0 or 10 or a million. Once you have a numerical handle, the distribution of the random variable is the full inventory of possible values and the probability of each.
Discrete vs continuous
If the values can be listed — even an infinite list like {0, 1, 2, …} — the distribution is discrete, and each value carries an honest probability. If the values fill an interval (a waiting time, a length, a temperature), the distribution is continuous: individual points have probability zero, and probabilities attach instead to intervals via the area under a probability density curve. The mantra is "area represents probability." A density must stay non-negative and integrate to one over the whole real line.
Why both kinds exist
Counting questions (how many sixes? how many emails this hour?) are naturally discrete. Measuring questions (how long until the bus? how tall is a randomly chosen adult?) are naturally continuous. The two flavours use slightly different machinery but share every key idea: an expected value, a spread, a name, and a set of conditions under which the model is honest.
The key distributions
Bernoulli and binomial — the success-counter
A Bernoulli trial is the simplest chance experiment: a single yes/no event with fixed success probability p. Flip a biased coin once. The binomial distribution generalises this: fix the number of trials n, keep p constant, insist that trials are independent, and count the total number of successes. The probability of getting exactly k successes is a tidy formula involving n, k, and p. The binomial fits the number of sixes in twenty dice rolls, or the number of correct guesses on a multiple-choice exam — but it does not fit the number of clubs in a thirteen-card bridge hand, because each card drawn changes the deck and so changes the next probability. Independence is non-negotiable.
Poisson — the rare-event counter
When n is huge and p is tiny but their product np (the expected count) is moderate, the binomial collapses into a much simpler one-parameter family: the Poisson distribution. Use it when events arrive essentially at random, independently, at some average rate per unit of time or space — radioactive decays per second, typos per page, customers per minute, meteors per night. The single parameter is the average rate λ; that one number determines the whole distribution.
Geometric and exponential — the waiting-time twins
Asked instead "how long until the first success?" you get the geometric distribution in discrete time (number of trials) or the exponential distribution in continuous time. Both have a memoryless property: given that the event hasn't happened yet, the remaining wait has the same distribution as a fresh wait. This is why a radioactive atom that has survived fifty years has the same expected remaining lifetime as a brand-new one.
Normal (Gaussian) — the everything-else default
The normal distribution is the symmetric bell curve. Two numbers identify any member of the family: the mean μ (where the peak sits) and the standard deviation σ (how wide the bell is). Standardise by subtracting μ and dividing by σ and you land on the standard normal — tabulated since de Moivre. The normal shows up everywhere because, by the central limit theorem (Making sense of probabilities's territory), sums and averages of many small independent contributions tend toward it regardless of the underlying distribution. Heights, measurement errors, exam scores, daily stock returns — all approximated by normals once enough small effects pile up.
Mental model — picking a distribution
Expected value vs variance
Expected value — where the centre sits
The expected value E[X] of a random variable is the probability-weighted average of its possible outcomes. For a fair six-sided die, E[X] = (1+2+3+4+5+6)/6 = 3.5 — a value the die can never actually show, but the long-run average over many rolls. For a binomial(n, p), E[X] = np. For a Poisson(λ), E[X] = λ. Expectation is linear: E[X + Y] = E[X] + E[Y] whether or not X and Y are independent, which is why averages compose so cleanly.
Variance — how wide the cloud spreads
The variance Var(X) = E[(X − E[X])²] measures the average squared distance of outcomes from the mean. Its square root, the standard deviation, has the same units as X and is the natural ruler for "typical deviation." Two distributions can share an expected value and still differ wildly — a fair coin paying ±£1 and a fair coin paying ±£1,000 have the same mean (zero) but radically different variances. Risk lives in variance, not expectation.
Why both matter
Expectation alone is reckless. Two investments with identical expected returns are not equivalent if one has ten times the variance — the higher-variance one will occasionally lose your shirt. The pair (mean, spread) is the minimum honest summary of any real-world random quantity.
Practical application
-
Identify the random variable. Name the number you actually care about — "wickets in the over," "minutes until the next email," "height of a randomly chosen recruit."
-
Classify the support. Is it a finite list, a countable list, or a continuous range? That alone halves the candidate distributions.
-
Audit the mechanism. Walk through the conditions of the candidate family. If even one assumption is shaky, downgrade your confidence — or switch families.
-
Pin down the parameters. Estimate the parameter(s) (p, λ, μ, σ) from data, theory, or stated rate.
-
Sanity-check tails. Compute one or two outcome probabilities by hand. If the distribution predicts events that "feel" impossibly rare or impossibly common, your model is mis-specified.
Example — staffing a help-desk
You run a help-desk that receives, on average, 12 calls per hour, distributed irregularly through the day. You need to decide how many agents to staff for the 9–10 am slot so that the probability of being overwhelmed is acceptably small.
Each call is essentially independent of every other (different customers, different problems), each second of the hour offers a tiny chance of one arriving, and the average rate is moderate — textbook Poisson territory, with λ = 12.
A Poisson with mean 12 has standard deviation √12 ≈ 3.46. So in a typical hour you should expect roughly 12 ± 3.5 calls; a busy hour might bring 18, a quiet one 6. If each agent can handle 4 calls per hour, staffing 3 agents (capacity 12) leaves you exposed on more than half of all busy hours; staffing 5 (capacity 20) covers you against all but the rarest spikes. The mean told you the centre; the variance told you how much padding to add.
If instead the question were "how long until the next call?" you'd switch to the exponential distribution with rate λ = 12 per hour, giving an average inter-arrival gap of 5 minutes — and, because of memorylessness, that same 5-minute expected wait applies even if you've already been twiddling your thumbs for 10 minutes.
Caveats
Related lessons
Related concepts
- Random Variablelinked concept
- Probability Distributionlinked concept
- Expected Valuelinked concept
- Variancelinked concept
- Binomial Distributionlinked concept
- Normal Distributionlinked concept
- Poisson Distributionlinked concept