Formatting Output

4 min read

Core idea

Once a pipeline has produced the right text, the next problem is presenting it. Unix's formatters divide along a sharp line. Simple formatters — nl, fold, fmt, pr, printf — accept a stream of plain lines and re-flow it into numbered, wrapped, columnar, paginated, or precisely-positioned output, with no knowledge of fonts or graphics. Document formatters — historically troff/nroff, today groff — read a stream that has been marked up with formatting requests (.TH, .SH, .TS … .TE) and produce typeset pages, PostScript, or PDF. The split is the same idea as Markdown versus HTML, or text versus rendered output: separate composition from layout, then run a processor over the marked-up source to produce the final form.

Shotts's argument: Unix's text-formatting model isn't dated — it's an early, principled split between content and presentation. Document preparation was one of the reasons Unix exists at all, and the same two-stage pipeline (write in text, render with a processor) underlies every web page, every man page, and every academic paper produced with TeX.

Why it matters

Presentation is a separable concern

Output that looks ad-hoc — a tangle of tabs and spaces glued together by echo — is brittle. As soon as a value gets longer or a column is added, the alignment breaks. Treating presentation as its own pipeline stage (using printf, pr, fmt) lets you change the layout without touching the data extraction.

`printf` is everywhere

Of all the tools in this family, printf is the highest-leverage. It is a bash builtin, behaves identically across languages (C, Python, Go, Rust, awk), and gives you precise control over field width, alignment, padding, precision, and numeric base in a few characters. Scripts that "look professional" almost always reach for printf over echo once any tabular output is involved.

Markup is a portable contract

The troff lineage looks archaic until you notice that every page on the web is produced the same way: write in a markup language, hand it to a processor, look at the result. Understanding groff and man-page markup makes the idea of "source markup + processor = final document" concrete, which transfers directly to Markdown, AsciiDoc, LaTeX, and HTML/CSS.

Key takeaways

Mental model

Two-stage document pipelines

Every Unix document workflow has the same architecture: a source file (plain text with optional markup), a processor that interprets the markup, and an output medium (terminal, PostScript, PDF). The simple formatters operate in the shallow end (printf is "almost no markup"); groff and TeX sit in the deep end.

Two-stage document pipelines

`printf` conversion specifiers in one picture

printf is small enough to be memorized. A specifier is a %, then optional flags, optional minimum field width, optional .precision, then a one-letter type. That's the whole grammar.

printf conversion specifiers in one picture

Practical application

A few habits make these tools predictable. With printf, always pass the format string in single quotes so the shell does not interpret $ or backslashes: printf '%-15s %5d\n' "$name" "$count". With pr, the -l (lines per page) and -w (columns wide) must match your output medium — and remember the default is US-letter at 6 lines per inch (66 lines per page). With groff, pick the output device explicitly: -T utf8 for terminals, -T ps for PostScript, -T pdf for PDF directly when the version supports it. Forgetting the device flag is the most common reason a groff invocation "produces no output."

When prototyping a report, write the content pipeline first using the slicers and editors from the previous topic, then append one or two formatting stages at the end. That separation keeps debugging fast — if the data is wrong, you do not have to re-debug the formatting; if the formatting is wrong, the data already looks right.

Example

Imagine a script that summarizes disk usage by directory and prints a tidy report. The data step is one pipeline: du -sh /home/* 2>/dev/null | sort -hr | head -10. That produces lines like 4.7G /home/jay. To turn that into a polished report — fixed-width columns, a header, line numbering, and a footer — wrap the data step with simple formatters.

{
  printf '%-30s %10s\n' "DIRECTORY" "SIZE"
  printf '%-30s %10s\n' "---------" "----"
  du -sh /home/* 2>/dev/null \
    | sort -hr \
    | head -10 \
    | awk '{ printf "%-30s %10s\n", $2, $1 }'
  printf '%-30s %10s\n' "Generated:" "$(date +%F)"
} | nl -ba | pr -h "Disk Usage Report" -l 30 -w 60

What each stage contributes:

The outer { … } groups the printf header, the data pipeline, and the printf footer into one continuous stream.
The data line uses awk (a richer cousin of cut) to reorder columns — du puts size first, but we want directory first.
Three printf calls give the report a header, a separator, and a footer using the same format string so columns align across all rows.
nl -ba numbers every line of the body so the report has row IDs.
pr -h "Disk Usage Report" -l 30 -w 60 paginates the result with a custom title, 30-line pages, and a 60-column width.

Send the same pipeline to a printer with one more pipe: | lpr. Or to a PDF: redirect to a file and convert with ps2pdf after groff rendering. The point is the layering — content and presentation are separate stages, each independently editable.

Text Processinglinked concept

Formatting Output

Core idea

Why it matters

Presentation is a separable concern

`printf` is everywhere

Markup is a portable contract

Key takeaways

Mental model

Two-stage document pipelines

`printf` conversion specifiers in one picture

Practical application

Example

Continue exploring

Tags

Formatting Output

Core idea

Why it matters

Presentation is a separable concern

printf is everywhere

Markup is a portable contract

Key takeaways

Mental model

Two-stage document pipelines

printf conversion specifiers in one picture

Practical application

Example

Related lessons

Related concepts

Continue exploring

Tags

`printf` is everywhere

`printf` conversion specifiers in one picture