Definition
A p-value is the probability, computed under the assumption that the null hypothesis is true, of obtaining a test statistic at least as extreme as the one actually observed. It is a measure of compatibility between the data and the null model: small p-values indicate that the observed sample would be surprising if the null were true, large ones indicate that it would not.
Conventionally, when the p-value falls below a pre-set significance level (often 0.05), the result is called statistically significant and the null is rejected. The threshold is a convention, not a law of nature — disciplines have moved toward stricter cutoffs (0.01, 0.001) as awareness of replication failures has grown.
Why it matters
How it works
Computing a p-value follows a fixed recipe. The analyst formulates the null and alternative hypotheses, chooses an appropriate test statistic (a t for means, a chi-square for categorical counts, an F for variance ratios), and computes its value from the sample. Each test statistic has a known distribution under the null — the sampling distribution — derived from probability theory. The p-value is then the tail area of that distribution beyond the observed statistic. Modern software returns the number directly; historically, analysts looked it up in tables.
The mechanism makes clear what the p-value is and is not. It is a conditional probability: how likely is data this extreme, given the null. It is not the probability that the null is true — that would require flipping the conditioning, which needs a prior probability and Bayes' rule. A p-value of 0.03 does NOT mean the null has a 3% chance of being true; it means that if the null were true, you would see data this extreme only 3% of the time. The shift in conditioning is subtle, but mishandling it is how researchers end up with claims that do not survive replication. The other practical danger is p-hacking — running many tests until one falls under 0.05 by chance alone. A p-value only carries the advertised meaning when the test was specified in advance and a single hypothesis was tested.