Concept

Compression

Definition

Compression is the process of re-encoding data into a smaller representation, either to reduce storage cost or to speed up transmission across a network. A compressor reads input bytes and emits a compressed stream whose length is shorter than the original; a decompressor reverses the operation. The key division is between lossless schemes, which reconstruct the original bytes exactly, and lossy schemes, which discard information judged imperceptible or unimportant in exchange for much smaller output.

Lossless compression is mandatory for text, code, executables, and any data where one wrong bit would corrupt the meaning. Lossy compression is suited to images, audio, and video, where the human sensory system tolerates — and often cannot detect — careful approximation.

Why it matters

How it works

Lossless compressors exploit redundancy. Run-length encoding replaces repeated bytes with a count plus a value. Dictionary coders such as LZ77 and its descendants (gzip, zstd, brotli) scan for sequences that already appeared and replace later occurrences with a back-reference to the earlier copy. Entropy coders such as Huffman and arithmetic coding then re-encode the resulting stream so that common symbols get short codes and rare symbols get long ones. The combined effect is dramatic on natural text and modest on already-random data, which has no redundancy left to exploit.

Lossy compressors transform the data into a representation where humans tolerate inaccuracy, then discard the parts the eye or ear cannot detect. JPEG converts an image to the frequency domain via the discrete cosine transform and quantises high-frequency coefficients aggressively; MP3 and AAC apply a perceptual model that strips inaudible frequencies and masking effects; modern video codecs combine spatial and temporal prediction so that only frame differences are stored. The acceptable loss is a tuning knob: lower quality settings produce smaller files at the cost of visible or audible artifacts.

Where it goes next

Continue exploring

Tags