Introducing statistics

3 min read

Core idea

Statistics is not a textbook subject — it is a habit of mind for handling a world saturated in numbers, percentages, graphs, and risks. The discipline answers a small set of recurring questions: can the data be summarised, do two batches differ, are two things related? Every serious statistical claim travels through four stages — pose a question, collect data, analyse it, interpret it — and the techniques you reach for depend on which stage you're in.

Why it matters

Most adults consume statistics constantly without naming them — checking weather forecasts, comparing prices, listening to political claims, deciding whether to fly or drive. Without a working grasp of how summaries are built and how risk scales with very large or very small numbers, intuition systematically misleads you. The point of statistical literacy is not arithmetic — it is judgement: knowing when a number is doing real work and when it is being deployed as decoration or distortion.

Mental model

The three statistical question types

Almost every chart, study, or claim you encounter is doing one of three jobs. Recognising which job pins down which technique is appropriate.

The three statistical question types

The four stages of an investigation

The technique depends on the stage. Sampling decisions belong upstream of any calculation, and interpretation lives downstream of every chart.

The four stages of an investigation

Practical application

When you encounter a statistical claim in the wild, run it through a short diagnostic before you accept or reject it.

  1. Identify the question. Is the claim summarising, comparing, or relating? "Average rainfall was 80 mm" is summarising; "treatment A beat treatment B" is comparing; "smokers live shorter lives" is relating.

  2. Find the data source. Was the data collected to answer this question, or repurposed from elsewhere? Sample size, sampling method, and definitions all matter — see the section on choosing a sample.

  3. Check the comparison base. Rates are only meaningful relative to a denominator. Road deaths per passenger-mile, not per accident. Crime per capita, not absolute counts across cities of different sizes.

  4. Probe the spread, not just the average. An average tells you the centre of a distribution; it tells you nothing about the variability. "Average income rose 8%" can hide that four people in five became worse off.

  5. Interpret carefully. Two things moving together is not two things causing one another, and a 100% hindsight is not a 100% prediction.

Example

Suppose a tech blog runs the headline: "Apps from Indie Developers Crash 3× More Often Than Big-Studio Apps." Walk it through the diagnostic.

  • Question type: comparing — two groups (indie vs studio).
  • Data source: automated crash logs from a single device-monitoring service. What's the sampling frame? Only users who opt into telemetry, mostly on flagship phones. Already a bias.
  • Denominator: is "3 times more often" measured per app install, per session, per hour of use, or per active user? A studio shooter played four hours a day will accumulate more sessions but possibly fewer crashes per session than a small utility opened weekly.
  • Spread: an "average" indie-app crash rate hides that one viral indie game with a known bug may be driving the whole gap.
  • Interpretation: even if the gap is real, "3×" doesn't tell you whether base rates are 0.1% versus 0.3% (essentially equivalent) or 5% versus 15% (a serious quality gap).

The same headline could be true and meaningless, or false and persuasive, depending entirely on details the headline omits. Statistical literacy is the habit of asking the missing questions before you form an opinion.

Continue exploring

Tags