Definition
Information theory is the mathematical framework for understanding what information is, how much of it a message contains, and how it can be transmitted with minimal error. Founded by Claude Shannon in 1948, it gave precise, quantitative answers to questions that had seemed irreducibly fuzzy: How much can a message be compressed without losing content? At what rate can a noisy channel carry reliable data? What is the minimum number of yes/no questions needed to identify an unknown?
At the heart of the theory is entropy — a measure of uncertainty or surprise. A message from a source that always emits the same symbol carries zero information because it reduces no uncertainty. A message from a source that can produce any of many equally likely symbols carries maximum information. Shannon entropy formalizes this intuition as a function of a probability distribution: H = -Σ p(x) log p(x), where higher entropy signals more uncertainty (and thus more information capacity) per symbol.
The theory's deepest result — Shannon's channel capacity theorem — establishes that every noisy channel has a maximum rate at which information can be transmitted with arbitrarily small error. This limit, the channel capacity, depends only on the noise level and bandwidth, not on the specific encoding scheme. The theorem guarantees that error-free communication is possible up to the limit and impossible beyond it, establishing a fundamental boundary on communication independent of technology.
Why it matters
How it works
Entropy and the bit
Shannon defined the unit of information as the bit — the amount of information in the outcome of a fair coin flip (two equally likely outcomes). Any source can be described in bits: a fair die has log₂(6) ≈ 2.58 bits of entropy per roll. The bit is not just a storage unit; it is a measure of surprise. Knowing the outcome of a highly probable event conveys little information; knowing the outcome of a rare event conveys a lot. This matches intuition: a headline reporting that the sun rose carries less information than one reporting that it did not.
Huffman coding and arithmetic coding translate this insight into practice. Symbols that occur frequently receive shorter codes; rare symbols receive longer ones. The resulting compression approaches the entropy limit — no lossless scheme can do better. When a file "can't be compressed further," it has reached its entropy limit.
Channels, noise, and capacity
A communication channel has a capacity C bits per second. Shannon proved that any source with entropy rate below C can be transmitted with error approaching zero by using an appropriate error-correcting code — even through a channel that randomly corrupts some symbols. Conversely, no scheme can exceed C reliably. The proof is non-constructive: it shows good codes exist without specifying them. Finding practical codes that approach capacity efficiently has been a major engineering effort, culminating in turbo codes and LDPC codes that are within fractions of a decibel of the Shannon limit.