Formatting Output

4 min read

Core idea

Once a pipeline has produced the right text, the next problem is presenting it. Unix's formatters divide along a sharp line. Simple formattersnl, fold, fmt, pr, printf — accept a stream of plain lines and re-flow it into numbered, wrapped, columnar, paginated, or precisely-positioned output, with no knowledge of fonts or graphics. Document formatters — historically troff/nroff, today groff — read a stream that has been marked up with formatting requests (.TH, .SH, .TS … .TE) and produce typeset pages, PostScript, or PDF. The split is the same idea as Markdown versus HTML, or text versus rendered output: separate composition from layout, then run a processor over the marked-up source to produce the final form.

Shotts's argument: Unix's text-formatting model isn't dated — it's an early, principled split between content and presentation. Document preparation was one of the reasons Unix exists at all, and the same two-stage pipeline (write in text, render with a processor) underlies every web page, every man page, and every academic paper produced with TeX.

Why it matters

Presentation is a separable concern

Output that looks ad-hoc — a tangle of tabs and spaces glued together by echo — is brittle. As soon as a value gets longer or a column is added, the alignment breaks. Treating presentation as its own pipeline stage (using printf, pr, fmt) lets you change the layout without touching the data extraction.

printf is everywhere

Of all the tools in this family, printf is the highest-leverage. It is a bash builtin, behaves identically across languages (C, Python, Go, Rust, awk), and gives you precise control over field width, alignment, padding, precision, and numeric base in a few characters. Scripts that "look professional" almost always reach for printf over echo once any tabular output is involved.

Markup is a portable contract

The troff lineage looks archaic until you notice that every page on the web is produced the same way: write in a markup language, hand it to a processor, look at the result. Understanding groff and man-page markup makes the idea of "source markup + processor = final document" concrete, which transfers directly to Markdown, AsciiDoc, LaTeX, and HTML/CSS.

Key takeaways

Mental model

Two-stage document pipelines

Every Unix document workflow has the same architecture: a source file (plain text with optional markup), a processor that interprets the markup, and an output medium (terminal, PostScript, PDF). The simple formatters operate in the shallow end (printf is "almost no markup"); groff and TeX sit in the deep end.

Two-stage document pipelines

printf conversion specifiers in one picture

printf is small enough to be memorized. A specifier is a %, then optional flags, optional minimum field width, optional .precision, then a one-letter type. That's the whole grammar.

printf conversion specifiers in one picture

Practical application

A few habits make these tools predictable. With printf, always pass the format string in single quotes so the shell does not interpret $ or backslashes: printf '%-15s %5d\n' "$name" "$count". With pr, the -l (lines per page) and -w (columns wide) must match your output medium — and remember the default is US-letter at 6 lines per inch (66 lines per page). With groff, pick the output device explicitly: -T utf8 for terminals, -T ps for PostScript, -T pdf for PDF directly when the version supports it. Forgetting the device flag is the most common reason a groff invocation "produces no output."

When prototyping a report, write the content pipeline first using the slicers and editors from the previous topic, then append one or two formatting stages at the end. That separation keeps debugging fast — if the data is wrong, you do not have to re-debug the formatting; if the formatting is wrong, the data already looks right.

Example

Imagine a script that summarizes disk usage by directory and prints a tidy report. The data step is one pipeline: du -sh /home/* 2>/dev/null | sort -hr | head -10. That produces lines like 4.7G /home/jay. To turn that into a polished report — fixed-width columns, a header, line numbering, and a footer — wrap the data step with simple formatters.

{
  printf '%-30s %10s\n' "DIRECTORY" "SIZE"
  printf '%-30s %10s\n' "---------" "----"
  du -sh /home/* 2>/dev/null \
    | sort -hr \
    | head -10 \
    | awk '{ printf "%-30s %10s\n", $2, $1 }'
  printf '%-30s %10s\n' "Generated:" "$(date +%F)"
} | nl -ba | pr -h "Disk Usage Report" -l 30 -w 60

What each stage contributes:

  • The outer { … } groups the printf header, the data pipeline, and the printf footer into one continuous stream.
  • The data line uses awk (a richer cousin of cut) to reorder columns — du puts size first, but we want directory first.
  • Three printf calls give the report a header, a separator, and a footer using the same format string so columns align across all rows.
  • nl -ba numbers every line of the body so the report has row IDs.
  • pr -h "Disk Usage Report" -l 30 -w 60 paginates the result with a custom title, 30-line pages, and a 60-column width.

Send the same pipeline to a printer with one more pipe: | lpr. Or to a PDF: redirect to a file and convert with ps2pdf after groff rendering. The point is the layering — content and presentation are separate stages, each independently editable.

Continue exploring

Tags