Searching For Files
5 min read
Core idea
A modern Linux filesystem holds millions of files. Two tools let you find what you need, and they pick opposite trade-offs. locate is fast because it queries a pre-built index of pathnames; it costs you currency (the index is rebuilt nightly, so brand-new files are invisible) and expressiveness (it only matches names). find is general because it walks the live tree every call; it costs you time on large trees but lets you filter by any combination of attributes — name, size, type, modification time, permissions, owner — and then act on the matches with -delete, -exec, or pipes to xargs. Together they cover every realistic search.
Shotts's argument:
locateis as simple asfindis complicated. Both have their uses. Take the time to masterfind— it teaches you predicate logic, action chaining, and Unix composition, which transfers to every other shell tool you'll ever use.
Why it matters
Finding is the gateway to acting
A list of paths is rarely the goal. You want to do something to those paths — delete them, archive them, change their permissions, run a script on each one. find's great trick is that the action is part of the search expression, not a follow-up command. find ~ -name '*.bak' -delete is a single transaction.
Predicate logic is everywhere
Once you see find's expression syntax — tests combined with -and, -or, -not, grouped by \( \), with implicit -and between adjacent tests — you have learned the same logical structure that drives SQL WHERE clauses, Elasticsearch queries, firewall rules, and every other filter language in computing. find is a hands-on lab for predicate composition.
Short-circuit evaluation is performance, not pedantry
find evaluates left to right and stops a chain the moment the result is decided. Putting cheap tests first (-type f) before expensive ones (-exec grep -l pattern {} +) avoids invoking the costly action on directories that would never have matched. Same idea as ordering if conditions in any programming language.
xargs handles the parts find can't
find -exec cmd {} \; runs cmd once per match — fine for ten files, slow for ten thousand. find -exec cmd {} + and the find ... | xargs cmd pattern combine many matches into a single cmd invocation. xargs -0 paired with find -print0 handles filenames containing spaces and newlines — the only safe pairing for arbitrary filenames.
Key takeaways
Mental model
Index versus walk
Two opposite ways to answer "where is the file named X?": maintain an index and query it, or walk the tree on demand. Each has costs.
Tests, operators, actions — the find grammar
A find expression has three kinds of pieces, and they combine in one consistent way:
- Tests return true or false for each file. Examples:
-type f,-name '*.log',-size +10M,-mtime -1,-perm /u+x. - Operators combine tests.
-and(implicit between adjacent tests),-or,-not(also!), and\( ... \)for grouping. - Actions run on files where the combined expression is true.
-print(default),-delete,-ls,-exec cmd {} \;,-exec cmd {} +.
The whole expression is evaluated for every file find visits, left to right with short-circuit semantics. The action is itself part of the expression and contributes its truth value to subsequent operators — that's why moving -print to the front changes behavior.
Short-circuit evaluation as a control flow tool
find borrows from C and shell: -and skips the right side if the left is false; -or skips the right side if the left is true. This is a feature, not just an optimization. You can write find . -name '*.tmp' -delete -or -print to delete .tmp files and print everything else. The -print runs only when -delete would have been skipped — i.e., for non-matching files.
Practical application
For one-off "where is that file" lookups, prefer locate. The first hit usually answers the question. If locate returns nothing and you suspect the file is new, run sudo updatedb to rebuild the index, then retry — or fall back to find.
For anything programmatic — scripts, cron jobs, deploy steps — prefer find. It is deterministic, never depends on the freshness of an external index, and its expression syntax makes intent self-documenting. Combine with xargs -0 (and find -print0) when filenames may contain whitespace.
Useful idioms worth memorizing:
find . -type f -name '*.log' -mtime +30 -delete— purge log files older than 30 days.find ~/code -type f -name '*.py' -newer pyproject.toml -print— list Python files touched since the project config.find /var/www -type f \( -perm /o+w -or -nogroup \) -ls— audit world-writable or orphaned files.find . -type f -print0 | xargs -0 grep -l TODO— grep across files, safe for any filename.find . -type d -empty -delete— clean up empty directories left behind by failed operations.
Example
A scenario familiar to anyone who has used a laptop for more than a year: your home directory has grown to ~80 GB. Disk space is tight. You suspect old screenshots, stale virtualenvs, and forgotten log files but you don't know where.
Step 1 — find the biggest offenders, not all of them. Run a size + type filter to surface large files:
find ~ -type f -size +100M -printf "%s\t%p\n" 2>/dev/null | sort -rn | head -20
The output is the twenty largest files in your home directory. Almost certainly there will be a 4 GB video you forgot, a couple of old VM images, and a few extracted tarballs. Inspect by hand; delete with rm only the ones you've confirmed.
Step 2 — find stale caches. Many tools cache to ~/.cache/<app>. Find caches not touched in 90 days:
find ~/.cache -type d -mtime +90 -maxdepth 2 -printf "%p\n"
Read the list. Anything that's clearly safe (browser cache, IDE indexes, pip cache) is fair game. Then run the same command with -exec rm -rf {} +.
Step 3 — clean up screenshots. If your screenshot tool dumps to ~/Pictures/Screenshots, prune anything older than six months:
find ~/Pictures/Screenshots -type f -name 'Screenshot*' -mtime +180 -print
Confirm the list looks right. Then re-run with -delete instead of -print. (Two-step ritual, every time.)
Step 4 — find orphaned files from deleted user accounts. Helpful on multi-user machines:
find / -xdev \( -nouser -or -nogroup \) -print 2>/dev/null
The -xdev keeps the search on one filesystem. The grouped -or test catches files belonging to UIDs or GIDs that no longer resolve to valid users — usually a sign that an account was deleted without cleaning up.
You have, in four find invocations, reclaimed tens of gigabytes without manually clicking through directories. The same predicates and the same two-step ritual scale from a personal laptop to terabyte-class servers.
Related lessons
Related concepts
- Filesystemlinked concept
- Abstractionlinked concept
- Compositionlinked concept
- Predicate Logiclinked concept
- Indexinglinked concept