Definition
An index is a precomputed auxiliary data structure that lets a system answer queries without scanning the entire underlying dataset. The trade is unambiguous: spend extra storage and pay an upkeep cost on every write so that reads become dramatically faster. A linear scan of N items costs O(N); the right index turns the same lookup into O(log N) with a balanced tree, O(1) with a hash table, or near-instant with a bitmap or inverted index.
Indexing is the engine under every fast lookup in computing. Databases, full-text search engines, filename caches like the Unix locate command, package managers, and even the filesystem's directory structures all rely on some form of index to make queries feasible at scale.
Why it matters
How it works
The simplest index is a sorted copy of a column. To find a value, binary-search the sorted list rather than scan the entire source. This is O(log N) but rebuilding the sorted list on every insert is O(N), so production systems use self-balancing trees — typically B-trees or B+ trees — that keep the data sorted with O(log N) insert and lookup. The B-tree's high branching factor keeps the tree shallow enough that a million-row table needs only three or four disk reads to find any value.
Different query shapes need different indexes. Equality lookups against a single column are best served by hash indexes, which compute a hash of the search key and jump directly to the bucket. Substring or full-text search needs an inverted index, which pre-tokenises every document and stores, for each token, the list of documents containing it. Geographic queries use spatial trees like R-trees or quad-trees. Column-oriented analytics warehouses use bitmap indexes that compress beautifully for low-cardinality columns. The universal lesson is that indexing is not one technique but a family of structures, each optimised for a particular query pattern — and the cost of choosing the wrong index can be greater than no index at all.