Arrays

3 min read

Core idea

bash supports two array flavours. Indexed arrays map non-negative integers to values — like a one-column spreadsheet. Associative arrays (bash 4.0+) map strings to values — a dictionary, a hash table, a lookup. Both are one-dimensional; both grow automatically; both are accessed with subscript syntax ${arr[key]}. Together they replace nearly every case where a less-experienced shell programmer would reach for parallel variables, awk, or a temporary file.

Why it matters

Without arrays, shell scripts accumulate scalar state in ugly ways: var1, var2, var3 declared in a loop; tab-separated strings split and re-split on every access; counts kept in one variable while names live in another. Arrays compress all of that into a single named structure with O(1) indexed access. Associative arrays in particular open up tabular reporting, frequency counting, and lookup-by-key patterns that would otherwise demand external tools. They are bash's least-used powerful feature.

Mental model

Indexed vs. associative — same syntax, different keys

The two flavours share virtually all operators. The only differences are the declaration (-A for associative) and what counts as a valid subscript. Once you internalise the table below, everything else is composition.

| Operation | Indexed | Associative | | ---------------------- | ------------------------ | ------------------------- | | Declare | declare -a arr (optional) | declare -A arr (required) | | Assign one | arr[5]=val | arr[red]=val | | Assign many | arr=(a b c) | arr=([k1]=v1 [k2]=v2) | | Read one | "${arr[5]}" | "${arr[red]}" | | All values | "${arr[@]}" | "${arr[@]}" | | All keys / indexes | "${!arr[@]}" | "${!arr[@]}" | | Length | ${#arr[@]} | ${#arr[@]} | | Element length | ${#arr[5]} | ${#arr[red]} | | Append | arr+=(d e) | arr+=([k3]=v3) | | Delete one | unset 'arr[5]' | unset 'arr[red]' | | Delete all | unset arr | unset arr |

Indexed vs. associative — same syntax, different keys

Iterating safely

Two parameter expansions dominate array work: "${arr[@]}" for values and "${!arr[@]}" for keys. The ! introduces indirection — give me the keys, not the things at those keys. Quoting matters as much as for $@:

| Form | Behaviour | | ----------------- | -------------------------------------------------- | | ${arr[@]} | each element split on $IFS (usually wrong) | | "${arr[@]}" | each element a separate word (almost always right) | | "${arr[*]}" | all elements joined into one string | | ${!arr[@]} | unquoted list of keys (fine for integer keys) | | "${!arr[@]}" | quoted list of keys (required for string keys) |

Why arrays are sparse

bash indexed arrays are not contiguous — arr[100]=foo is legal, and ${#arr[@]} reports 1, not 101. This surprises people coming from C or Python. The reason is that bash stores arrays as a sparse map internally; subscripts are keys, not memory offsets. ${!arr[@]} is the only way to learn what's actually populated. It also means "append" doesn't mean "to slot ${#arr[@]}" — use arr+=(val) to let bash pick the next slot above the highest existing index.

Practical application

  1. Decide indexed or associative up front. Order matters? Indexed. Lookup by name? Associative. If you'd reach for a dict in Python or a Map in JS, you want associative — and you need declare -A before any assignment.

  2. Initialise explicitly when needed. If your script later increments ${counts[$key]} you may want to seed it with zero first; bash treats unset elements as empty, which (( … )) interprets as 0 — but explicit init keeps intent visible.

  3. Always use "${arr[@]}" — never bare ${arr[@]}. Same rule as "$@" — the quotes preserve element boundaries for values containing spaces.

  4. Use mapfile to load file contents. mapfile -t lines < file.txt is the canonical "read every line into an array" idiom. The -t trims the trailing newline from each entry.

  5. Sort by piping out and reading back. bash has no sort builtin. The pattern is mapfile -t sorted < <(printf '%s\n' "${arr[@]}" | sort) — a clean round-trip that survives spaces.

Example

A real ops task: scan a directory of log files and produce a per-owner summary — how many files each user owns, and how many bytes those files total. Associative arrays make this trivially clean compared to the awk-and-sort version:

#!/usr/bin/env bash
shopt -s nullglob
declare -A file_count   # owner → count
declare -A byte_total   # owner → bytes

for f in /var/log/*.log; do
  # stat -c gives us "<owner> <size>" in one call.
  read -r owner size < <(stat -c '%U %s' "$f")
  (( file_count[$owner] += 1 ))
  (( byte_total[$owner] += size ))
done

# Render a sorted report. Note "${!file_count[@]}" gives the keys.
printf '%-12s %6s %12s\n' "OWNER" "FILES" "BYTES"
for owner in $(printf '%s\n' "${!file_count[@]}" | sort); do
  printf '%-12s %6d %12d\n' \
    "$owner" "${file_count[$owner]}" "${byte_total[$owner]}"
done

The whole script is the two declare -A lines, one accumulator loop, and one render loop. Without associative arrays you'd be juggling parallel arrays of names and counts, doing linear scans to check "have I seen this owner already?", and the script would balloon to twice the length and a third the clarity.

Continue exploring

Tags