Formatting Output
4 min read
Core idea
Once a pipeline has produced the right text, the next problem is presenting it. Unix's formatters divide along a sharp line. Simple formatters — nl, fold, fmt, pr, printf — accept a stream of plain lines and re-flow it into numbered, wrapped, columnar, paginated, or precisely-positioned output, with no knowledge of fonts or graphics. Document formatters — historically troff/nroff, today groff — read a stream that has been marked up with formatting requests (.TH, .SH, .TS … .TE) and produce typeset pages, PostScript, or PDF. The split is the same idea as Markdown versus HTML, or text versus rendered output: separate composition from layout, then run a processor over the marked-up source to produce the final form.
Shotts's argument: Unix's text-formatting model isn't dated — it's an early, principled split between content and presentation. Document preparation was one of the reasons Unix exists at all, and the same two-stage pipeline (write in text, render with a processor) underlies every web page, every man page, and every academic paper produced with TeX.
Why it matters
Presentation is a separable concern
Output that looks ad-hoc — a tangle of tabs and spaces glued together by echo — is brittle. As soon as a value gets longer or a column is added, the alignment breaks. Treating presentation as its own pipeline stage (using printf, pr, fmt) lets you change the layout without touching the data extraction.
printf is everywhere
Of all the tools in this family, printf is the highest-leverage. It is a bash builtin, behaves identically across languages (C, Python, Go, Rust, awk), and gives you precise control over field width, alignment, padding, precision, and numeric base in a few characters. Scripts that "look professional" almost always reach for printf over echo once any tabular output is involved.
Markup is a portable contract
The troff lineage looks archaic until you notice that every page on the web is produced the same way: write in a markup language, hand it to a processor, look at the result. Understanding groff and man-page markup makes the idea of "source markup + processor = final document" concrete, which transfers directly to Markdown, AsciiDoc, LaTeX, and HTML/CSS.
Key takeaways
Mental model
Two-stage document pipelines
Every Unix document workflow has the same architecture: a source file (plain text with optional markup), a processor that interprets the markup, and an output medium (terminal, PostScript, PDF). The simple formatters operate in the shallow end (printf is "almost no markup"); groff and TeX sit in the deep end.
printf conversion specifiers in one picture
printf is small enough to be memorized. A specifier is a %, then optional flags, optional minimum field width, optional .precision, then a one-letter type. That's the whole grammar.
Practical application
A few habits make these tools predictable. With printf, always pass the format string in single quotes so the shell does not interpret $ or backslashes: printf '%-15s %5d\n' "$name" "$count". With pr, the -l (lines per page) and -w (columns wide) must match your output medium — and remember the default is US-letter at 6 lines per inch (66 lines per page). With groff, pick the output device explicitly: -T utf8 for terminals, -T ps for PostScript, -T pdf for PDF directly when the version supports it. Forgetting the device flag is the most common reason a groff invocation "produces no output."
When prototyping a report, write the content pipeline first using the slicers and editors from the previous topic, then append one or two formatting stages at the end. That separation keeps debugging fast — if the data is wrong, you do not have to re-debug the formatting; if the formatting is wrong, the data already looks right.
Example
Imagine a script that summarizes disk usage by directory and prints a tidy report. The data step is one pipeline: du -sh /home/* 2>/dev/null | sort -hr | head -10. That produces lines like 4.7G /home/jay. To turn that into a polished report — fixed-width columns, a header, line numbering, and a footer — wrap the data step with simple formatters.
{
printf '%-30s %10s\n' "DIRECTORY" "SIZE"
printf '%-30s %10s\n' "---------" "----"
du -sh /home/* 2>/dev/null \
| sort -hr \
| head -10 \
| awk '{ printf "%-30s %10s\n", $2, $1 }'
printf '%-30s %10s\n' "Generated:" "$(date +%F)"
} | nl -ba | pr -h "Disk Usage Report" -l 30 -w 60
What each stage contributes:
- The outer
{ … }groups theprintfheader, the data pipeline, and theprintffooter into one continuous stream. - The data line uses
awk(a richer cousin ofcut) to reorder columns —duputs size first, but we want directory first. - Three
printfcalls give the report a header, a separator, and a footer using the same format string so columns align across all rows. nl -banumbers every line of the body so the report has row IDs.pr -h "Disk Usage Report" -l 30 -w 60paginates the result with a custom title, 30-line pages, and a 60-column width.
Send the same pipeline to a printer with one more pipe: | lpr. Or to a PDF: redirect to a file and convert with ps2pdf after groff rendering. The point is the layering — content and presentation are separate stages, each independently editable.
Related lessons
Related concepts
- Text Processinglinked concept