Definition
Input validation is the practice of checking that data arriving at a program — from a user, a network request, a configuration file, or another process — matches the shape, type, range, and constraints the program expects before any code acts on it. It is the boundary discipline that separates the messy outside world from the internal logic that assumes well-formed data.
Validation is not the same as sanitisation. Validation accepts or rejects; sanitisation modifies. Validating an email address answers "is this a valid email?" with yes or no; sanitising it might strip whitespace or lowercase the domain. Most secure code does both, in that order: validate first, then sanitise, then act.
Why it matters
How it works
A validation routine runs four kinds of checks. Existence: is the field present at all? Type: is it the kind of data expected — string, integer, date? Format: does it match the expected shape — phone-number pattern, ISO-date layout, valid UTF-8? Range and constraint: does it fall within the allowed bounds — age between zero and one-hundred-fifty, file size under ten megabytes, no characters that have meaning in downstream contexts like SQL or HTML? Each layer rejects data the previous layer let through, and the program proceeds only when all layers pass.
The architectural rule is to validate at the perimeter. Code deep inside a system should not have to re-check whether an integer is positive or a string is non-empty — those checks should have happened at the entry point. The Parse-Don't-Validate pattern formalises this idea: instead of repeatedly validating a raw string throughout the codebase, parse it once into a type that can only hold valid values (a NonEmptyString, an Email, a PositiveInt). After that point the type system enforces the invariant for free. The Unix shell does something analogous with command-line argument parsers like getopt: once the parser has accepted the arguments, downstream code can assume they are well-formed.