Definition
A result is statistically significant when the data is sufficiently inconsistent with the null hypothesis that the analyst is willing to reject it. Operationally, the test computes a p-value — the probability of observing a result at least as extreme as the one in hand, under the assumption that the null hypothesis is true — and compares it to a pre-chosen significance level α. If the p-value is smaller than α, the result is declared significant.
The phrase is widely used and widely abused. Statistical significance is a narrow technical claim about the compatibility of data with a specific null. It is not a claim about effect size, practical importance, or the probability that the alternative hypothesis is true. Each of those is a separate question requiring separate evidence.
Why it matters
How it works
The test machinery starts from the null hypothesis — a default position usually stating that there is no effect, no difference, no association. The analyst computes a test statistic from the data and asks: if the null were true, how often would I see a test statistic this extreme or more? That probability is the p-value. If it falls below α (conventionally 0.05), the analyst rejects the null and reports the result as statistically significant. The threshold is not magic — it is just the false-positive rate the analyst has decided to tolerate.
A common misreading treats the p-value as the probability that the null is true given the data. It is the opposite: the probability of the data given the null. The two quantities are related by Bayes' theorem and they need not be close. A significant p-value tells you that the data is unusual under the null, which lets you reject the null — it does not directly tell you how likely any particular alternative is. The other persistent confusion is between statistical and practical significance. A medication that lowers blood pressure by half a point may be statistically significant in a trial of fifty thousand patients and clinically meaningless. The discipline is to report effect sizes and confidence intervals alongside the significance verdict, so the reader can judge magnitude as well as detectability.