Concept

Distributions

Definition

A distribution describes how the possible values of an uncertain quantity are spread out — which outcomes are common, which are rare, and how far results tend to stray from the typical case. It replaces a single number with a full picture of variation.

Different phenomena follow different shapes. Heights and measurement errors cluster symmetrically around an average in a bell curve, where extreme values are vanishingly rare. Wealth, city sizes, and book sales follow a heavily skewed shape where a few enormous values dominate and the average says little about the typical case. Knowing which shape applies changes what counts as normal and what counts as surprising.

Why it matters

How it works

To think in distributions is to ask not just what will happen but how the range of what could happen is shaped. A bell-shaped distribution lets you treat the average as representative and extremes as negligible. A heavy-tailed distribution does the opposite — the extremes carry the story, and the average is a poor guide to any single case.

Misjudging the shape is a common and expensive error. Planning for a bell curve in a world of heavy tails leaves you exposed to the rare events that dominate outcomes; treating a bell-curve world as wild leads to needless caution. The skill is matching the assumed shape to the evidence before drawing any conclusion.

Seeing the shape: histogram vs. bar chart

To read a distribution you have to plot it, and the right plot depends on the kind of variable. A histogram visualizes a continuous distribution — age ranges, prices, time intervals, measurements. Its bars touch to signal that the underlying variable is a number line, and the silhouette they trace — symmetric, skewed, bimodal, or heavy-tailed — carries the analytical story that summary statistics hide. Two datasets with identical means and standard deviations can have dramatically different histograms, with entirely different practical implications.

A bar chart, by contrast, is for discrete categories, and its bars have gaps to signal that the categories are conceptually separate with no ordering between them. Using touching bars for categorical data implies a continuum that does not exist — a subtle but genuine misrepresentation. The chart choice is not decorative; it is a claim about whether the variable is continuous or discrete.

Where it goes next

Continue exploring

Tags