Concept

Data Visualization

Definition

Data visualization is the practice of turning numerical or categorical data into graphical form — points, lines, bars, areas, colors, positions — so that human visual perception can do the work that scanning a table cannot. A well-chosen chart compresses thousands of values into a single image whose shape, density, and outliers are legible at a glance. A poorly chosen chart hides the signal or, worse, manufactures one that is not in the data.

Visualization sits at the intersection of statistics, perception research, and design. The statistics decide what is worth showing; perception research decides which encodings the eye can read accurately; design decides what the reader sees first, second, and not at all. A chart is a claim about the data, and like every claim, it can be honest, sloppy, or actively misleading.

Why it matters

How it works

Effective visualization begins with the question — what comparison, distribution, or relationship does the reader need to see — and then selects the encoding that makes that judgment easiest. Cleveland and McGill's perception research established a rough hierarchy: position along a common scale is read most accurately, followed by position along non-aligned scales, length, angle, area, color saturation, and color hue. A chart that asks the eye to compare areas (pie chart) or hues (rainbow heatmap) is asking it to do something it does poorly. A chart that asks the eye to compare positions on a shared axis is asking it to do what it does well.

The mechanics of building a chart are now cheap — matplotlib, seaborn, plotly, ggplot, ECharts, D3, and Observable Plot all turn a dataframe into a rendered figure in a handful of lines. The hard work is what surrounds the chart: choosing the right type for the question, scaling axes honestly, selecting a colormap that does not invent ordering where none exists, labeling clearly, and stripping every pixel that does not earn its place. Tufte's "data-ink ratio" — maximize the share of pixels devoted to representing data — remains the most useful single discipline, and the most commonly violated.

Where it goes next

Continue exploring

Tags