2007-12-16

Sysadmin Sunday: Fun with find

Introduction
find is a ubiquitous utility found on every single UNIX variant known to man. It is fast, efficient, and extremely powerful. Its main purpose is to locate files on the system based on criteria passed to it in the command-line arguments.

This week in Sysadmin Sunday, I'll extol find in all it's glory... or most of it, anyways.

Basics of find
find requires a path and nothing more. find . would show all files in the current directory, while recursing all the way through every subdirectory. There are a great many options to find, though. We'll work through some of them.

Find's syntax is usually as follows:
find [path] [options] [actions]

Actions
-print
Simply prints the matching filenames to standard output. -print is implied if no action is specified. You could, in theory avoid using -print, and on most modern implementations of find, it would be fine.

-print0
Separates each returned file with a null character instead of a carriage return. This is useful when the list of files returned may have spaces or other unusual characters in the filenames which may cause inconsistency in how files are handled. This works primarily with xargs, and I'll cover it at the end of the article.

-exec
I generally don't recommend using find's "exec" flag, simply because it's easier to make errors with it. You'll get more predictable results using xargs or shell escapes. See "Acting on output" later on in this article.

-prune
Stops recursion in the current directory if the option is matched.

-o (OR) allows you to perform some other option/action combination. This is particularly useful when you do a ! (NOT) operator for an option.

Options
-name
If you know the name of a file (or part of it) but you have no idea where in the hell it is on your system, find -name is here to save you. On my webserver, I know there's this absolutely hilarious gif animation of a guy trying to get out of a room. The sign next to the door says "pull to open" in some asian script. Or that's what someone told me, since I can't read it myself. I remember naming it "pulltoopen.gif" but I can't remember where I buried it, though -- and I have gigs of pictures and other random stuff on my web server.

$ find ~axon -name pulltoopen.gif -print
/home/axon/public_html/temp/pulltoopen.gif

There it is! Hey, you guys have to see this:

Okay, enough goofing around.

-follow Instructs find to follow symbolic links when recursing through subdirectories. Usually, you won't need this functionality, but occasionally it comes in handy.

-mtime, -ctime, -atime [days ago]
mtime, atime, and ctime are modification, access and change to the file respectively. What's the difference between modification and change? Well, modification means that the file's contents were altered. Change means that the permissions and/or ownership have been updated. Access times are obvious. Any time the file is read, atime gets altered. Find can tell you when a file was last looked at, modified or changed. To see what files I've modified (including newly created files) in the last day, I use -mtime 0. You can see the screenshots I uploaded for the Google Spreadsheets article which were uploaded the same day I wrote this article:

$ find public_html -mtime 0 -print
public_html/hir
public_html/hir/gs-explore.html
public_html/hir/gs-explore.phps
public_html/hir/gse-1-login.png
public_html/hir/gse-2-sslist.png
public_html/hir/gse-3-wslist.png
public_html/hir/gse-4-cells.png
The key here is that these will only show things in 24-hour slices of time. -mtime 3 would only show stuff that's changed 36-47 hours ago. If you wish to see everything that's changed in the last 3 days, you can use a minus in front of the argument. find [path] -mtime -3 would show everything that's changed not only 3 days ago, but from now until 3 days ago. This comes in handy for use in making backup scripts, or finding something that you know you recently downloaded.

-user [username], -group [groupname], -nouser and -nogroup
-nouser and -nogroup are great for locating stray files on the filesystem that no longer have an owner or a group associated with them. This can happen when you extract a tar file as the root user, or if you remove a user from the system without cleaning up all of their files first. Similarly, -user and -group can be used to limit find to only files owned by a specific entity.

-type [x]
Type will locate only a specific type of file. Values for [x]:
b block special
c character special
d directory
f regular file
l symbolic link
p FIFO
s socket
For example, if you wanted to find all symbolic links in /etc:
$ find /etc/* -type l -print
/etc/aliases
/etc/localtime
/etc/resolv.conf

\!
Any option can be preceded by \! (also called "NOT") to return the opposite of an option.

Acting on output
In the end, find merely creates a list of files. In and of itself, find is pretty useless unless -- like my examples -- you're just looking for a few files among gigabytes of content. By its very nature, UNIX relies on the flexibility of other utilities, command-line construction, piping and redirection of output to really get things done.

xargs method
Find and xargs go together like two peas in a pod. xargs is a little-known utility that takes piped input and builds a huge command-line argument out of the contents. When you have thousands and thousands (or just a few dozen) of files listed from find, and wish to do something with them all at once, xargs is a great utility to have around. For example, if I wanted to create a backup of all of my files including my crontab, mail, and temporary files, this would work nicely:
# find / -user axon -print | xargs tar czf axon.tgz
command-line construction (quick 'n' dirty shell escape) method
You can directly use find's output by using backticks around your find command. All the output from find will be passed on the command line. This is a different method to do the same thing as the above example, without xargs:
# tar czf axon.tgz `find / -user axon -print`
Iterating through the list (for loop) method
If there's something a little more complicated that you want to do with the list of files, you can make a little miniature shell script on the command line, or a standalone script which takes action on each file one at a time. This is particularly useful if you need to do multiple actions on a file, or if you're using a utility that can only handle one file at a time, for example, converting all .gif files to PNG format with imagemagick's convert utility:
for file in `find . -name "*.gif" -print`
do
convert $file `echo $file | sed s/"gif"/"png"/`
done
Using \!, -prune, and -o
Occasionally, you want to exclude certain files or directories from find. Maybe you don't want it to descend and scan networked drives or temporary files. \! returns only things that do not match the options. Alone, this might be a huge amount of files. -o (or) allows you to continue processing with additional conditions. -prune is similar that it will take whatever the options match and quit descending further.

Find all files owned by root in the /home directory and subdirectories, but prune (don't descend into) any directories named "tmp":
$ find /home -name tmp -prune -o -user root -print
Find everything in /dev that is not a symbolic link:
$ find /dev \! -type l -print

Complications
If the list that find returns contains any spaces (or occasionally other bizarre characters) in the filenames, all of the above methods of dealing with the list will fail, as space and return both may be used to symbolize a break of filenames. A file named "My file.txt" will be handled as two files, "My" and "file.txt", neither of which is correct.

Here are 3 files in a directory, 2 with spaces in the name:
$ ls
My File.txt My File2.txt MyFile3.txt
Using find and xargs to do a long listing on them fails because of the spaces:
$ find . -type f -print | xargs ls -l
ls: ./My: No such file or directory
ls: ./My: No such file or directory
ls: File.txt: No such file or directory
ls: File2.txt: No such file or directory
-rw-r--r-- 1 axon staff 1800 Dec 15 00:04 ./MyFile3.txt
By replacing "-print" with "-print0" (that's a Zero, not an "o") and using xargs with the -0 (Zero) option, find and xargs will once again work together in harmony. Filenames will be separated by a null character from find, and xargs will handle all input literally, acting on filenames separated by a null.
$ find . -type f -print0 | xargs -0 ls -l
-rw-r--r-- 1 axon staff 4847 Dec 15 00:04 ./My File.txt
-rw-r--r-- 1 axon staff 13444 Dec 15 00:04 ./My File2.txt
-rw-r--r-- 1 axon staff 1800 Dec 15 00:04 ./MyFile3.txt

blog comments powered by Disqus