Definition
Discrete data takes only specific, separated values — typically whole-number counts or distinct categories — with no possibility of meaningful intermediate values. The number of children in a household, the count of defects on a production line, the colour of a car, and the result of a die roll are all discrete: between any two adjacent valid values there is a gap rather than a continuum. Discrete data contrasts with continuous data, where values can in principle take any number along a real-valued scale (height, weight, time, temperature).
The distinction shapes everything that follows. The right visual displays, the right summary statistics, the right probability distributions, and the right statistical tests all depend on whether the underlying data is discrete or continuous. Mistaking one for the other is a common source of methodological error.
Why it matters
How it works
The first step in any analysis is to classify each variable. Discrete categorical variables — such as eye colour or product type — admit only counts and proportions as legitimate summaries; computing an arithmetic mean of category codes is meaningless. Discrete ordinal variables — such as survey responses on a five-point scale — permit medians and ranges but raise honest debate about whether means and parametric tests are appropriate. Discrete count variables, such as the number of customer complaints per day, often look continuous from a distance but are bounded below at zero and constrained to integers; this constrains which probability distributions can fit them well.
Choosing the right summary and the right distribution preserves the structure of the data. The binomial distribution models the count of successes in a fixed number of independent trials. The Poisson distribution models the count of events in a fixed window when events arrive independently at a steady rate. Both are workhorses for discrete data and have no continuous analogue that exactly preserves the counting property. Respecting the discrete nature of the input keeps every downstream inference defensible.