Donation?

Harley Hahn
Home Page

Send a Message
to Harley


A Personal Note
from Harley Hahn

Unix Book
Home Page

List of Chapters

Table of Contents

List of Figures

Chapters...
   1   2   3
   4   5   6
   7   8   9
  10  11  12
  13  14  15
  16  17  18
  19  20  21
  22  23  24
  25  26

Glossary

Appendixes...
  A  B  C
  D  E  F
  G  H

Command
Summary...

• Alphabetical
• By category

Unix-Linux
Timeline

Internet
Resources

Errors and
Corrections

Endorsements


INSTRUCTOR
AND STUDENT
MATERIAL...

Home Page
& Overview

Exercises
& Answers

The Unix Model
Curriculum &
Course Outlines

PowerPoint Files
for Teachers

Chapter 25...

Working With Files

This is the last of three chapters devoted to Unix files. In Chapter 23, we discussed the Unix filesystem in detail. At the time, I explained that there are three types of files: directories, ordinary files, and pseudo files. For day-to-day work, directories and ordinary files are the most important, so I want to make sure you master all the basic skills related to these two types of files. In Chapter 24, we talked about how to use directories. In this chapter, we will discuss the details of working with ordinary files.

Throughout the chapter, when I use the term "file", I am referring to ordinary files. Thus, to be precise, the title of the chapter should actually be "Working With Ordinary Files".

Our plan for the chapter is as follows: first, I will show you how to create, copy, move and rename ordinary files. We will then discuss permissions, the attributes that allow users to share files. From there, we will talk about what goes on behind the scenes, and you will see that manipulating files actually involves working with "links". Finally, I will explain how to search for files and how to process files that have been found in a search. It sounds like a lot — and it is — but I promise, by the time you finish, it will all make sense.

Jump to top of page

Creating a File: touch

How do you create a file? Strangely enough, you don't. Unix creates files for you as the need arises; you rarely need to create a new file for yourself.

There are three common situations in which a file will be created for you automatically. First, when necessary, many programs will create a file for you automatically. For example, let's say you start the vi editor (Chapter 22) by using the command:

vi essay

This command specifies that you want to edit a file named essay. If essay does not exist, vi will create it for you the first time you save your work. In our example, I used vi, but the same principal applies to many other programs as well.

Second, when you redirect output to a file (see Chapter 15), the shell will create the file if it does not already exist. For example, say that you want to save the output of the ls command to a file named listing. You enter the ls command, redirecting the output:

ls > listing

If listing does not already exist, the shell creates it for you.

Finally, when you copy a file, the copy program creates a new file. For example, say that you want to copy the file data to a file named extra. You enter the following command:

cp data extra

If the file extra does not exist, it will be created automatically. (The cp command is explained later in the chapter.)

However, let's say that for some reason you want to create a brand new, empty file. What is the easiest way to do it? In Chapter 24, I explained how to use the mkdir command to make a new directory. Is there an analogous command to make an ordinary file? The answer is no, but there is a command that has the side effect of creating an empty file. This command is called touch and here is how it works.

In Chapter 24, I explained how to display the modification time (ls -l) or access time (ls -lu) of a file. The modification time is the last time the file was changed; the access time is the last time the file was read. The main purpose of touch is to change the modification time and the access time of a file without changing the file. Imagine yourself reaching out and carefully touching the file (hence the name). The syntax is:

touch [-acm] [-t time] file...

where time is a time and date in the form [[YY]YY]MMDDhhmm[.ss].

By default, touch sets both the modification and the access times to the current time and date. For example, let's say that a file named essay was last modified on July 8 at 2:30 PM. You enter:

ls -l essay

The output is:

-rw-------  1 harley staff  4883 Jul  8 14:30 essay

It is now 10:30 AM, December 21. You enter:

touch essay

Now when you enter the same ls command you see:

-rw-------  1 harley staff  4883 Dec 21 10:30 essay

When might you use touch? Let's say you are preparing to distribute a set of files — music, software, whatever — and you want them to all have the same time and date. Change to the directory that holds the files and enter:

touch *

All the files matched by the * wildcard (see Chapter 24) now have the same modification time and access time.

If you want to change the modification time only, use the -m option. If you want to change the access time only, use -a. To use a specific time and date instead of the current time, use -t followed by a time in the format [[YY]YY] MMDDhhmm[.ss]. Here are two examples. Let's say today is August 31. The first command changes the modification time (only) to 5:29 PM on the current day. The second command changes the access time (only) to December 21, 2008, 10:30 AM:

touch -m -t 08311729 file1
touch -a -t 200812211030 file2

Realistically, you will rarely have a need to change the modification time or the access time for a file. However, touch has one very important side effect: if the file you specify does not exist, touch will create it for you. Thus, you can use touch to create brand new, empty files whenever you want. For example, to create a file named newfile, just enter:

touch newfile

If you want, you can create more than one new file at a time:

touch data1 data2 data3 temp extra

When you use touch to create a new file, the modification time and access time will be the current time and date. If this doesn't suit you, you can use the options we discussed above to set a specific time as you create the file.

One last option: If you are updating the modification time or access time for a number of files, and you don't want touch to create any new files, use the -c (no create) option. For example, the following command will update the times for the specified files. However, if a file does not exist it will not be created:

touch -c backup1 backup2 backup3 backup4

— hint —

Most of the time, there is no need to use touch to create new files, because — as we have discussed — new files are almost always created for you automatically as the need arises.

Where touch does come in handy is when you need some temporary files quickly, say, to experiment with file commands. When this happens, using touch is the fastest way to create a set of brand new, empty files, for example:

touch test1 test2 test3

Jump to top of page

Naming a File

Unix is liberal with respect to naming files. There are only two basic rules:

1. Filenames can be up to 255 characters long.(*)

* Footnote

Technically, the maximum size of a filename is set by the filesystem, not by Unix or Linux. Most modern filesystems default to a maximum filename length of 255 characters. However, some filesystems are more flexible. For example, if you are a filesystem nerd, it is easy to modify the ext2, ext3 or ext4 filesystems to allow up to 1012 characters in a filename.

2. A filename can contain any character except / (slash) or the null character.

This only makes sense. As you know from Chapter 24, the / character is used as a separator within pathnames so, of course, you can't use it within a filename. The null character is the character consisting of all zero bits (see Chapter 23). This character is used as a string terminator in the C programming language, and you would normally never use it within a filename.

To these two rules, I am going to add a third one of my own.

3. Create filenames that are meaningful to you.

As an example, the name data doesn't mean much compared to the more descriptive chemlab-experiment-2008-12-21. True, the long name is complicated and takes longer to type, but once you know how to use filename completion (see Chapter 13), you will rarely have to type a complete name. For example, for the long name above, you might be able to type ch<Tab> and let the shell complete the name for you.

My advice is to choose meaningful names for all your files at the moment you create them. Otherwise, you will eventually accumulate many files that may or may not contain valuable data. If you are like everyone else, I am sure you think that, one day, you will go through all your files and delete the ones you don't need. However, if you are like everyone else, you will probably never actually do so.(*)

* Footnote

If you need convincing, just ask yourself, "Right now, how many photos do I have on my computer that, one day, I will sort through?"

— hint —

The best way to keep junk from accumulating in your directories is to give meaningful names to your files when you create them.

Unix allows you to create filenames that contain all sorts of outlandish characters: backspaces, punctuation, control characters, even spaces and tabs. Obviously, such filenames will cause trouble. For example, what if you use the ls -l command to list information about a file named info;date:

ls -l info;date

Unix would interpret the semicolon as separating two commands:

ls -l info
date

Here is another example. Say that you have a file named -jokes. It would be a lot of trouble using the name in a command, for example:

ls -jokes

Unix would interpret the - (hyphen) character as indicating an option.

Generally speaking, you will run into trouble with any name that contains a character with a special meaning (<, >, |, ! and so on). The best idea is to confine yourself to characters that cannot be misinterpreted. These are shown in Figure 25-1. Hyphens are okay to use, as long as you don't put them at the beginning of the name.

Figure 25-1
Characters that are safe to use in filenames

Unix allows you to use any characters you want in a filename except a / (slash) or a null. However, your life will be a lot easier if you stick to letters, numbers, the dot, the hyphen (but not at the beginning of a name), and the underscore.

a, b, c...Lowercase letters
A, B, C...Uppercase letters
0, 1, 2...Numbers
.Dot
-Hyphen
_Underscore

If you ever do end up with a name that contains spaces or other strange characters, you can sometimes make it work by quoting the name. (Quoting is explained in Chapter 13.) Here is a particularly meretricious example:

ls -l 'this is a bad filename, but it does work'
ls -l this\ is\ a\ bad\ filename\,\ but\ it\ does\ work

To finish this section, I am going to explain three important file naming conventions. First, as we discussed in Chapters 14 and 24, files whose names begin with a . (dot) character are called dotfiles or hidden files. When you use ls, such files are listed only if you specify the -a (all) option. By convention, we use names that start with a dot only for files that contain configuration data or initialization commands. (Figure 24-5 contains a list of the common dotfiles.)

Second, we often use filenames that end with a dot followed by one or more letters to indicate the type of the file. For example, C source files have names that end in .c, such as myprog.c; MP3 music files have names that end in mp3; files that have been compressed by the gzip program have names that end in .gz; and so on. In such cases, the suffix is referred to as an EXTENSION. There are literally hundreds of different extensions. Such extensions are convenient as they allow you to use wildcards (see Chapter 24) to refer to a group of files. For example, you can list the names of all the C source files in a directory by using:

ls *.c

Finally, as you will remember, Unix distinguishes between upper- and lowercase. Thus, the names info, Info and INFO are completely different. A Unix person would simply use info. Now, consider the following names that you might use for a directory that contains programs or shell scripts:

Program Files
ProgramFiles
programfiles
program-files
program_files
bin

The first name is what Windows uses. As Unix people, we reject the name immediately because it contains a space. The next name has two uppercase letters, which makes it difficult to type, so we reject it as well. But what about the next three names? They contain no spaces or uppercase letters or strange characters. However, for important directories, it's handy to have short, easy names, which is why bin gets our vote.

In the world of Unix, we have a convention that names beginning with uppercase letters are reserved for files that are important in some special way. For example, when you download software that comes in the form of a set of files, you will often find a file named README. Because uppercase comes before lowercase in the ASCII code (Chapters 19 and 20), such names will come first in the directory listing and will stand out(*). For this reason, as a general rule, I recommend that you use only lowercase letters when you name files and directories.

* Footnote

If you are using the C locale. This will not be the case if you are using the en_US locale (see Chapter 19).

— hint —

If you are programmer, you will be tempted, from time to time, to use the filename test for a program or shell script you are developing. Don't do it.

It happens that the shell has a builtin command named test, which is used to compare values within a shell script. If you name one of your programs test, whenever you try to run the program by typing its name, you will get the shell builtin instead. Nothing will seem to happen, and you will end up wasting a lot of your time trying to figure out the problem.

(If you are interested in knowing what test does, look it up in the online manual.)

Jump to top of page

Copying a File: cp

To make a copy of a file, use the cp command. The syntax is:

cp [-ip] file1 file2

where file1 is the name of an existing file, and file2 is the name of the destination file.

Using this command is straightforward. For example, if you have a file named data and you want to make a copy named extra, use:

cp data extra

Here is another example. You want to make a copy of the system password file (see Chapter 11). The copy should be called pword and should be in your home directory. As we discussed in Chapter 24, the ~ character represents your home directory, so you can use:

cp /etc/passwd ~/pword

If the destination file does not exist, cp will create it. If the destination file already exists, cp will replace it. When this happens, there is no way to get back the data that has been replaced. Consider the first example:

cp data extra

If the file extra does not exist, it will be created. However, if the file extra did exist, it would be replaced. When this happens, the data in the original file is lost forever; there is no way to get it back. (Read the last sentence again.)

To append data to the end of a file, you do not use cp. Rather, you use the cat program and redirect the output (see Chapter 16). For example, the following command appends the contents of data to the end of extra. In this case, the original contents of extra are preserved.

cat data >> extra

Since cp can easily wipe out the contents of a file, if you want to be extra careful, you can use the -i (interactive) option:

cp -i data extra

The -i option tells cp to ask your permission before replacing a file that already exists. For example, you might see:

cp: overwrite extra (yes/no)?

If you respond with anything that starts with "y" or "Y" (for "yes"), cp will replace the file. If you type any other answer — such as pressing the <Return> key — cp will not make the replacement.

The only other option I want you to know about is -p (preserve). This option gives the destination file the same modification time, access time, and permissions as the source file. (We will discuss permissions later in the chapter.)

Jump to top of page

Copying Files to a
Different Directory: cp

The cp command can be used to copy one or more files to a different directory. The syntax is:

cp [-ip] file... directory

where file is the name of an existing file, and directory is the name of an existing directory. The -i (interactive) and -p (preserve) options work as described in the previous section.

Here is an example. To copy the file data to a directory named backups, use:

cp data backups

To copy the three files data1, data2 and data3 to the backups directory, use:

cp data1 data2 data3 backups

Here is one more example, a little more complicated. Your working directory is /home/harley/work/bin. You want to copy the file adventure from the directory /home/harley/bin to the working directory. To refer to source directory, we use ../../bin; to refer to the working directory, we use a . by itself. The command is:

cp ../../bin/adventure .

— hint —

You can often use wildcards to specify more than one filename (see Chapter 24.) For example, to copy the three files data1, data2 and data3 to the backups directory, you can use:

cp data[123] backups

If there are no other files whose names begin with data, you can use:

cp data* backups

If there are no other files whose names begin with d, you can use:

cp d* backups

Jump to top of page

Copying a Directory to
Another Directory: cp -r

You can use cp to copy a directory and all of its files to another directory by using the -r option. The syntax is:

cp -r [-ip] directory1... directory2

where directory1 is the name of an existing directory, and directory2 is the name of the destination directory. The -i (interactive) and -p (preserve) options work as described earlier in the chapter. The -r (recursive) option tells cp to copy an entire subtree.

Here is an example. Say that, within your working directory, you have two subdirectories: essays and backups. Within the essays directory, you have many files and subdirectories. You enter:

cp -r essays backups

A copy of essays, including all its files and subdirectories, is now in backups. When you use -r, the cp command creates new directories automatically as needed.

— hint —

To copy all the files in a directory use cp with the * wildcard (see Chapter 24), for example:

cp documents/* backups

To copy the directory itself including all its files and subdirectories, use cp with the -r option, for example:

cp -r documents backups

Jump to top of page

Moving a File: mv

To move a file to a different directory, use the mv (move) command. The syntax is:

mv [-if] file... directory

where file is the name of an existing file, and directory is the name of the target directory.

The mv command will move one or more files to an existing directory. (To create a directory, use the mkdir command, explained in Chapter 24.) Here are two examples. The first command moves a file named data to a directory named archive:

mv data archive

You must be careful. If a directory named archive does not exist, mv will think you want to rename the file (see below). The next example moves three files, data1, data2 and data3, to the archive directory:

mv data1 data2 data3 archive

As with most file commands, you can use a wildcard specification. For example, the last command can be abbreviated to:

mv data[123] archive

If the target to which you move a file already exists, the source file will replace the target file. In such cases, the original contents of the target file will be lost and there is no way to get it back, so be careful. If you want to be cautious about losing data, use the -i (interactive) option. For example:

mv -i data archive

This tells mv to ask your permission before replacing a file that already exists. If you type an answer that begins with the letter y or Y (for "yes"), mv will replace the file. If you type any other answer — such as pressing the <Return> key — mv will not make the replacement. In this example, mv would ask your permission before replacing a file named archive/data.

The opposite option is -f (force). This forces mv to replace a file without checking with you. The -f option will override the -i option as well as restrictions imposed by file permissions (explained later in the chapter). Use -f with care and only when you know exactly what you are doing.

Jump to top of page

Renaming a File or Directory: mv

To rename a file or directory, use the mv (move) command. The syntax is:

mv [-if] oldname newname

where oldname is the name of an existing file or directory, and newname is the new name. The -i (interactive) and -f (force) options work as described in the last section.

Renaming a file or directory is straightforward. For example, to rename a file from unimportant to important, use:

mv unimportant important

If the target (in this case, important) already exists, it will be replaced. All the data in the original target will be lost, and there is no way to get it back, so be careful. You can use the -i and -f options, described in the last section, to control the replacement: -i tells mv to ask you before replacing a file; -f forces the replacement no matter what.

As you might expect, you can use mv to rename and move at the same time. For example, say that incomplete is a file and archive is a directory. The following command moves incomplete to the directory archive (which must already exist). As part of the move, the file will be renamed to complete:

mv incomplete archive/complete

Finally, consider what happens if you use mv with a directory named old as follows:

mv old new

If there is no directory named new, the old directory will be renamed new. However, if there is a directory named new, the old directory will be moved to become a subdirectory of new. (Take a moment to think about this.)

Jump to top of page

Deleting a File: rm

To delete a file, use the rm (remove) command. The syntax is:

rm [-fir] file...

where file is the name of a file you want to delete.

(Notice that the name of this command is "remove", not "delete". This will make sense when we talk about links later in the chapter.)

To delete a file, just specify its name. Here are some examples. The first command deletes a file named data in your working directory. The second command deletes a file named essay in your home directory. The next command deletes a file named spacewar in the directory named bin, which lies in your working directory.

rm data
rm ~/essay
rm bin/spacewar

As with all file commands, you can use wildcard specifications (see Chapter 24). Here are two examples. The first command deletes the files data1 , data2 and data3 in the working directory. The second command deletes all the files in your working directory, except dotfiles. (Obviously, this is a very powerful command, so do not experiment with it.)

rm data[123]
rm *

The first command deletes the files data1, data2 and data3 in the working directory. The second command is a powerful one: it deletes all the files in your working directory, except dotfiles. (Think very carefully before you use this command.)

Once you delete a file, it is gone for good. There is no way to get back an erased file, so be careful.

When you use rm with a wildcard specification, it is a good idea to test it first with an ls command to see what files are matched. Here is an example. You want to delete the files data.backup, data.old and data.extra. You are thinking about using the wildcard specification data* which would match all files whose names begin with data. However, to be prudent, you check this specification by entering:

ls data*

The output is:

data.backup  data.extra  data.important  data.old

You see that you had forgotten about the file data.important. If you had used rm with data* you would have lost this file. Instead, you can use:

rm data.[beo]*

This will match only those files you really want to delete.

Before you use rm with a wildcard specification, test it first with ls to confirm which files will be matched.

Jump to top of page

How to Keep From Deleting
the Wrong Files: rm -if

As I mentioned in the previous section, it is a good idea to check a wildcard pattern with ls before you use it with an rm command. However, even if you check the pattern with ls, you might still type it incorrectly when you enter the rm command. Here is a foolproof way to solve the problem.

In Chapter 13, in our discussion of aliases, I showed you how to define an alias named del, which runs the rm command using the same arguments as the preceding ls command. For the Bourne shell family (Bash, Korn shell), the command to define this alias is:

alias del='fc -s ls=rm'

For the C-Shell family (Tcsh, C-Shell), the command is:

alias del 'rm \!ls:*'

(The details are explained in Chapter 13.) To define the del alias permanently, just put the appropriate command in your environment file (Chapter 14). Once the alias is defined, it is easy to use. To start, enter an ls command with the wildcard specification that describes the files you want to delete. For example:

ls data.[beo]*

Take a look at the list of files. If they are really the ones you want to delete, enter:

del

This will execute the rm command using the filenames from the previous ls command. If the list of files is not what you want, try changing the pattern.

A handy alternative is to use the -i (interactive) option. This tells rm to ask your permission before deleting each file. For example, you can enter:

rm -i data*

The rm program will display a message for each file, asking your permission to proceed, for example:

rm: remove regular file `data.backup'?

If you type a response that begins with "y" or "Y" (for "yes"), rm will delete the file. If you type any other answer — such as pressing the <Return> key — rm will leave the file alone.

It is common for people to create an alias that automatically inserts the -i option every time they use the rm command. Here are the aliases. The first one is for the Bourne Shell family; the second one is for the C-Shell family:

alias rm='rm -i'
alias rm 'rm -i'

Some system administrators put such an alias in the system-wide environment file, thinking they are doing their users a favor.

This practice is to be deplored for two reasons. First, Unix was designed to be terse and exact. Having to type "y" each time you want to delete a file slows down your thought processes. Using an automatic -i option makes for sloppy thinking because users come to depend on it.

If you feel like arguing the point, think about this: it is true that, during the first week, a new user who is not used to the rm command may accidentally delete one or two files, and he won't get them back. However, the experience is an important one, and it won't be long before he will learn to use the command carefully. I believe that, in the long run, developing your skills is always the better alternative to being coddled. The truth is in spite of the potential power of rm, experienced Unix users rarely delete files by accident, because they have formed good habits.

The second reason I don't want you to use the -i option automatically is that, eventually, you will use more than one Unix or Linux system. If you become used to a slow, awkward, ask-me-before-you-delete-each-file rm command, you will forget that most Unix systems do not work that way. One day, you will find yourself on a different system, and it will be all too easy to make a catastrophic mistake. Your fingers have a memory, and once you get used to typing rm instead of rm -i, it is a difficult habit to unlearn.

For this reason, if you really must create an alias for rm -i, give it a different name, for example:

alias erase='rm -i'
alias erase 'rm -i'

One final point. Later in the chapter, we will discuss file permissions. At that time, you will see that there are three types of permissions: read, write and execute. I won't go into the details now except to say that without write permission, you are not allowed to delete a file. If you try to delete a file for which you do not have write permission, rm will ask your permission to override the protection mechanism.

For example, say that the file data.important has file permissions of 400. (The "400" will make sense later. Basically, it means that you have read permission, but not write or execute permission.) You enter:

rm data.important

You will see the question:

rm: remove write-protected regular file `data.important'?

If you type a response that begins with "y" or "Y" (for "yes"), rm will delete the file. If you type any other answer — such as pressing the <Return> key — rm will leave the file alone. If you are careful, you can tell rm to perform the deletion without asking permission — regardless of file permissions — by using the -f (force) option:

rm -f data.important

On some systems, -f will also override the -i option.

— hint —

When you delete files with rm, the -f (force) option will override file permissions and (on some systems) the -i option. For this reason, only use -f when you are sure you know what you are doing.

Jump to top of page

Deleting an Entire Directory Tree: rm -r

To delete an entire directory tree, use the rm command with the -r (recursive) option and specify the name of a directory. This tells rm to delete not only the directory, but all the files and subdirectories that lie within the directory. For example, let's say you have a directory named extra. Within this directory are a number of files and subdirectories. Within each subdirectory are still more files and subdirectories. To delete everything all at once, enter:

rm -r extra

Here is another example, deceptively simple, yet powerful. To delete everything under your working directory, use:

rm -r *

Obviously, rm -r can be a dangerous command, so if you have the tiniest doubt that you know what you are doing, -r is a good option to forget about. At the very least, think about using -i (interactive) option at the same time. This tells rm to ask permission before deleting each file and directory, for example:

rm -ir extra

To delete an entire directory tree quickly and quietly, regardless of file permissions, you can use the -f option:

rm -fr extra

Remember, on some systems, -f will also override the -i option, so please be careful.

— hint —

Before using rm with the -r option to delete an entire directory tree, always take a moment and use pwd to display your working directory. Imagine what the following command would do if you were in the wrong directory:

rm -rf *

To conclude the discussion of the rm command, let's take a quick look at how easy it is to wipe out all your files. Say that your home directory contains many subdirectories, the result of months of hard work. You want to delete all the files and directories under the extra directory.

As it happens, you are not in your home directory. What you should do is change to your home directory and then enter the rm command:

cd
rm -fr extra

However, you think to yourself, "There is no point in typing two commands. I can do the whole thing in a single command." You intend to enter:

rm -fr ~/extra

(Remember, as we discussed in Chapter 24, the ~, tilde, character represents your home directory.) It happens, however, that you are in a hurry, you accidentally type a space before the slash:

rm -fr ~ /extra

In effect, you have entered a command to delete all the files in two directory trees: ~ (your home directory) and /extra.

Once you press <Return>, don't even bother trying to hit ^C or <Delete> (whichever is your intr key) to abort the command. The computer is so much faster than you, there is no way to catch a runaway rm command. By the time you realize what has happened, all your files are gone, including your dotfiles. (I tested this command so you don't have to: just believe me.)

As we discussed in Chapter 4, when you log in as root, you become superuser. As superuser, you are able to do just about anything, including deleting any file or directory in the entire system. What do you think would happen if you logged in as superuser and entered the following command?

rm -fr /

(Don't try this at home unless you have a note from your mother.)

— technical hint —

You must be especially careful when you use rm -fr with variables. For example, let's say you have a shell script that uses the variables $HOME and $FILE (see Chapter 12) as follows:

rm -fr $HOME/$FILE

If for some reason, neither of the variables is defined, the command becomes:

rm -fr /

At best, you will delete all your own files, including all your (hidden) dotfiles. If you run the script as superuser, you will cause a catastrophe. Adding the -i option won't help because, as I explained earlier, on many systems -f overrides -i.

Jump to top of page

Is It Possible to Restore a File
That Has Been Deleted?

No.

Jump to top of page

File Permissions

Unix maintains a set of FILE PERMISSIONS (often called PERMISSIONS) for each file. These permissions control which userids can access the file and in what way. There are three types of permissions: READ PERMISSION, WRITE PERMISSION and EXECUTE PERMISSION. The three permissions are independent of one another. For example, your userid might have read and write permission for a particular file, but not execute permission for the file. It is important to understand that permissions are associated with userids, not users. For example, if someone were to log in with your userid, he or she would have the same access to your files as you do.

The exact meaning of file permissions depends on the type of file. For ordinary files, the meaning of the permissions is straightforward: read permission enables a userid to read the file. Write permission enables a userid to write to the file. Execute permission enables a userid to execute the file. Of course, it makes no sense to try to execute a file unless it is executable. As a general rule, a file is executable if it is a program or a script of some type. A shell script, for example, contains commands to be executed by the shell.

The three types of permissions are distinct, but they do work together. For example, in order to edit a file, you need both read and write permission. In order to run a shell script, you need both read and execute permission.

As you will see later in the chapter, you are able to set and change the permissions for your own files. You do so for two reasons:

• To restrict access by other users

Restricting which userids may access your files provides security for your data in a straightforward manner.

• To guard against your own errors

If you want to protect a file from being deleted accidentally, you can make sure that there is no write permission for the file. Many commands that replace or delete data will ask for confirmation before changing a file that does not have write permission. (This is the case for the rm and mv commands we discussed earlier in the chapter.)

With directories, permissions have somewhat different meanings than with ordinary files. Read permission enables a userid to read the names in the directory. Write permission enables a userid to make changes to the directory (create, move, copy, delete). Execute permission enables a userid to search the directory.

If you have read permission only, you can list the names in a directory, but that is all. Unless you have execute permission, you cannot, for example, check the size of a file, look in a subdirectory, or use the cd command to change the directory.

Consider the following unusual combination. What would it mean if you had write and execute permission for a directory, but not read permission? You would be able to access and modify the directory without being able to read it. Thus, you could not list the contents but, if you knew the name of a file, you could delete it.

For reference, Figure 25-2 contains a summary of file permissions as they apply to ordinary files and directories.

Figure 25-2
Summary of file permissions

File permissions control which userids can access a file. Every file has three sets of permissions: for the owner, for the group, and for everyone else. Each set of permissions has three components: read permission, write permission, and execute permission. The meanings of these permissions are somewhat different for ordinary files than for directories.

Ordinary Files
ReadRead from the file
WriteWrite to the file
ExecuteExecute the file
Directories
ReadRead the directory
WriteCreate, move, copy or delete entries
ExecuteSearch the directory
 

— hint —

When you first learn about the directory permissions, they may seem a bit confusing. Later in this chapter, you will learn that a directory entry contains only a filename and a pointer to the file, not the actual file itself.

Once you understand this, the directory permissions will make perfect sense. Read permission means you can read directory entries. Write permission means you can change directory entries. Execute permission means you can use directory entries.

Jump to top of page

Setuid

Within your Unix system, you do not exist. You log in as a particular userid, and you run programs to do your bidding. As a person who lives in the outside world, your role is limited to furnishing input and reading output. It is your programs that do the real work. For example, to sort data, you use the sort program; to rename a file, you use the mv program; to display a file, you use less; and so on.

As a general rule (with one exception, which we'll discuss in a moment) whenever you run a program, that program runs under the auspices of your userid. This means that your programs have the exact same privileges as your userid. For example, let's say your userid does not have read permission for a file named secrets. You want to see what's in the file, so you enter:

less secrets

Since your userid cannot read the file, the programs you call upon to do your work cannot read the file either. As a result, you see the message:

secrets: Permission denied

If you really want to see inside the secrets file, you have three choices. First, you can change its file permissions (I'll explain how to do so later in the chapter). Second, you can log in with a userid that already has read permission for that file. Third, if you know the root password, you can log in as superuser, which allows you to bypass virtually all restrictions.

In other words — unless you are superuser — your programs are bound by the restrictions of your userid, with one exception. There are times when it is necessary for a regular userid to run a program with special privileges. To make this possible, there is a special file permission setting that allows other userids to access a file as if they were the owner (creator) of the file. This special permission is called SETUID (pronounced "set U-I-D") or SUID. The name stands for "set userid".

In most cases, setuid is used to allow regular userids to run selected programs that are owned by root. This means that no matter which userid runs the program, it executes with root privileges. This enables the program to perform tasks that could normally be done only by the superuser. For example, to change your password, you use the passwd program. However, to change your password, the program must modify the password file and the shadow file (see Chapter 11) which requires superuser privileges. For this reason, the passwd program itself is stored in a file that is owned by root and has setuid turned on.

How can you tell that a file has the setiud file permission? When you display a long listing, you will see the letter "s" instead of "x" as one of the file permissions. For example, you enter:

ls -l /usr/bin/passwd

This displays a long listing for passwd program. The output is:

-r-s--x--x  1 root root  21944 Feb 12 2009 /usr/bin/passwd

The "s" in the file permissions (4th character from the left) indicates the setuid permission. The "s" replaces what would otherwise be an "x".

Obviously, such permissions can be a security risk. After all, a program running amok with superuser privileges can be used to hack into a system or cause damage. Thus, the use of setuid is strictly controlled. As a general rule, setuid is used only to allow regular userids to run a program with temporary privileges in order to perform a specific task.

Jump to top of page

How Unix Maintains
File Permissions: id, groups

The programmers at Bell Labs who created the first Unix system (see Chapter 1) organized file permissions in a way that is still in use today. At the time Unix was developed, people at Bell Labs worked in small groups that shared programs and documents. For this reason, the Unix developers created three categories: the user, the user's work group, and everyone on the system. They then designed Unix so as to maintain three sets of permissions for each file. Here is how it works.

The userid that creates the file becomes the OWNER of the file. The owner is the only userid that can change the permissions for a file(*). The first set of file permissions describe how the owner may access the file. Each userid belongs to a group (explained below). The second set of permissions apply to all other userids that are in the same group as the owner. The third set of permissions apply to the rest of the userids on the system. This means that, for each of your files and directories, you can assign separate read, write and execute permissions for yourself, for the people in your work group, and for everyone else.

* Footnote

The only exception is that the superuser, who can do virtually anything, can change the permissions for any file. If necessary, the superuser can also change the owner and group of a file by using the chown and chgrp commands respectively.

Here is an example. You are working with a group of people developing a program. The file that contains the program resides in one of your personal directories. You might set up the file permissions so that both you and your group have read, write and execute permission, while all the other users on the system have only read and execute permission. This means that, while anyone can run the program, only you or members of your group can change it.

Here is another example. You have a document that you don't want anyone else to see. Just give yourself read and write permission, and give no permissions to your group or to everyone else.(*)

* Footnote

Remember, however, you can't hide anything from the superuser.

It is important to understand that the permissions for "everyone" do not include you or the members of your group. Imagine a strange situation in which you give read permission for a file to everyone, but no permissions to your group. Members of your group will not be able to read the file. However, everyone else will. In addition, if there are users on your network who have access to your filesystem, they too fall in the category of "everyone", even if they don't have an account on your particular system.

So who is in your group? When your system administrator created your account, he or she also assigned you to a GROUP. Just as each user has a name called a userid, each group has a name called a GROUPID (pronounced "group-I-D"). The list of all the groupids in your system is kept in the file /etc/group, which you are free to examine at any time:

less /etc/group

The name of your group is kept in the password file /etc/passwd (described in Chapter 11), along with your userid, the name of your home directory, and other information. The easiest way to display your userid and groupid is to use the id command. (Just type the name by itself with no options.)

The id command is particularly handy in one very specific circumstance. You are doing some system administration work and, from time to time, you need to change from your own userid to root (superuser) and back again. If you become confused and you can't remember which userid you are using, you can always enter the id command. (This happens to me several times a month.)

Question: Suppose a system administrator is walking through a computer lab, and he sees a machine someone has left logged into a Unix system. What does he do?

Answer: The first thing he does is enter the id command to see who is logged in. He makes a note of the userid — so he can talk to the user — and then logs out by typing exit.

In the early days of Unix, each userid could belong only to one group. Modern Unix systems, however, allow users to belong to multiple groups at the same time. For each userid, the group that is listed in the password file is called the PRIMARY GROUP. If the userid belongs to any other groups, they are called SUPPLEMENTARY GROUPS. There are two ways in which you can display a list of all the groups to which your userid belongs. First, you can use the id program (with Solaris, you must use id -a):

id

Here is some sample output:

uid=500(harley) gid=500(staff) groups=500(staff),502(admin)

In this case, userid harley belongs to two groups: the primary group staff, and one supplementary group admin.

Another way to display all your groups is to use the groups program. The syntax is:

groups [userid...]

where userid is a userid.

By default, groups displays the names of the groups to which the current userid belongs. If you specify one or more userids, groups will show you to which groups they belong. Try the following two commands on your system. The first one displays all of your groups; the second displays the groups to which the superuser (userid root) belongs:

groups
groups root

How important are groups? In the 1970s, groups were very important. Most Unix users were researchers working in a trustworthy environment that was not connected to an outside network. Placing each userid in a group allowed the researchers to share work and collaborate with their colleagues. Today, however, for regular users, groups are sometimes ignored for two reasons.

First, most people have their own Unix or Linux computer, and when you are the only one using a system, there is no one to share with. Second, even on a shared system or a large network, system administrators often do not find it worthwhile to maintain groups that are small enough to be useful. For example, if you are an undergraduate student at a university, your userid might be part of a large group (such as all social science students) with whom sharing would be a meaningless experience.

Having said that, you should know that some organizations do take the trouble to maintain groups in order to share data files or executable programs. For example, at a university, students taking a particular course may be given userids that belong to a group that was set up just for that course. In this way, the teacher can create files that can be accessed only by those students.

— hint —

Unless you have an actual need to share files with the other userids in your group, it's better to ignore the group idea altogether. When you set file permissions (explained later in the chapter), just give the "group" the same permissions you give to "everyone".

Jump to top of page

Displaying File Permissions: ls -l

To display the file permissions for a file, use the ls command with the -l (long listing) option. The permissions are shown on the left-hand side of the output. To display the permissions for a directory, set the -d option along with -l. (The ls command, along with these options, is explained in Chapter 24.)

Here is an example. You enter the following command to look at the files in your working directory:

ls -l

The output is:

total 109
-rwxrwxrwx 1 harley staff 28672 Sep 5 16:37 program.allusers
-rwxrwx--- 1 harley staff  6864 Sep 5 16:38 program.group
-rwx------ 1 harley staff  4576 Sep 5 16:32 program.owner
-rw-rw-rw- 1 harley staff  7376 Sep 5 16:34 text.allusers
-rw-rw---- 1 harley staff  5532 Sep 5 16:34 text.group
-rw------- 1 harley staff  6454 Sep 5 16:34 text.owner

We discussed most of this output in Chapter 24. Briefly, the filename is on the far right. Moving to the left, we see the time and date of the last modification, the size (in bytes), and the group and userid of the owner. In this case, the owner of the files is userid harley and the group is staff. To the left of the owner is the number of links (which I will discuss later in this chapter). At the far left, the first character of each line is the file type indicator. An ordinary file is marked by -, a hyphen; a directory (there are none in this example) is marked by a d.

What we want to focus on here are the 9 characters to the right of the file type indicator. Their meaning is as follows:

r  =  read permission
w  =  write permission
x  =  execute permission
-  =  permission not granted

To analyze the permissions for a file, simply divide the 9 characters into three sets of 3. From left to right, these sets show the permissions for the owner of the file, the group, and for all other userids on the system. Let's do this for all the files in the example:

OwnerGroupOtherFile
rwxrwxrwxprogram.allusers
rwxrwx---program.group
rwx------program.owner
rw-rw-rw-text.allusers
rw-rw----text.group
rw-------text.owner

We can now see exactly how each permission is assigned. For instance, the file text.owner has read and write permissions for the owner, and no permissions for the group or for anyone else.

Jump to top of page

File Modes

Unix uses a compact, three-number code to represent the full set of file permissions. This code is called a FILE MODE or, more simply, a MODE. As an example, the mode for the file text.owner in the last example is 600.

Within a mode, each number stands for one set of permissions. The first number represents the permissions for the userid that owns the file; the second number represents the permissions for the userids in the group; the third number represents the permissions for all the other userids on the system. Using the example I just mentioned, we get:

6  =  permissions for owner
0  =  permissions for group
0  =  permissions for all other userids

Here's how the code works. We start with the following numeric values for the various permissions:

4  =  read permission
2  =  write permission
1  =  execute permission
0  =  no permission

For each set of permissions, simply add the appropriate numbers. For example, to indicate read and write permission, add 4 and 2. Figure 25-3 shows each possible combination along with its numeric value.

Figure 25-3: Numeric values for file permission combinations

There are three types of file permissions: read permission, write permission, and execute permission. The value of these permissions are represented by 3 different numbers which are added together as shown in the table. See text for details.

Read Write Execute Components Total
0 + 0 + 00
yes0 + 0 + 11
yes0 + 2 + 02
yesyes0 + 2 + 13
yes4 + 0 + 04
yesyes4 + 0 + 15
yesyes4 + 2 + 06
yesyesyes4 + 2 + 17

Let's do an example. What is the mode for a file in which:

• The owner has read, write and execute permissions?
• The group has read and write permissions?
• All other userids have read permission only?

Owner: read + write + execute=4+2+1=7
Group: read + write=4+2+0=6
Other: read=4+0+0=4

Thus, the mode is 764. New let's take a look at the examples from the previous section:

OwnerGroupOtherModeFile
rwx = 7 rwx = 7 rwx = 7 777 program.allusers
rwx = 7 rwx = 7 --- = 0 770 program.group
rwx = 7 --- = 0 --- = 0 700 program.owner
rw- = 6 rw- = 6 rw- = 6 666 text.allusers
rw- = 6 rw- = 6 --- = 0 660 text.group
rw- = 6 --- = 0 --- = 0 600 text.owner

Now, let's do an example going backwards. What does a file mode of 540 mean? Using Figure 25-3, we see:

Owner:5=read + execute
Group:4=read
Other:0=nothing

Thus, the owner can read and execute the file. The group can only read the file. Everyone else has no permissions.

Jump to top of page

Changing File Permissions: chmod

To change the permissions for a file, use the chmod (change file mode) command. The syntax is:

chmod mode file...

where mode is the new file mode, and file is the name of a file or directory.

Only the owner or the superuser can change the file mode for a file. As I mentioned earlier, your userid is automatically the owner of every file you create.

Here are some examples of how you might use chmod. The first command changes the mode for the specified files to give read and write permission to the owner, and read permission to the group and to everyone else. These permissions are suitable for a file you want to let anyone read, but not modify.

chmod 644 essay1 essay2 document

The next command gives the owner read, write, and execute permissions, with read and execute permissions for the group and for everyone else. These permissions are suitable for a file that contains a program that you want to let other people execute, but not modify.

chmod 755 spacewar

In general, it is prudent to restrict permissions unless you have a reason to do otherwise. The following commands show how to set permissions only for the owner, with no permissions for the group or everyone else. First, to set read and write permissions only:

chmod 600 homework.text

Next, to set read, write, and execute permissions:

chmod 700 homework.program

When you create a shell script or a program, it will, by default, have only read and write permissions. In order to execute the script, you will have to add execute permission. Use chmod 700 (or chmod 755 if you want to share).

— hint —

To avoid problems, do not give execute permission to a file that is not executable.

Jump to top of page

How Unix Assigns Permissions
to a New File: umask

When Unix creates a new file, it starts with a file mode of:

666: for non-executable ordinary files
777: for executable ordinary files
777: for directories

From this initial mode, Unix subtracts the value of the USER MASK. The user mask is a mode, set by you, showing which permissions you want to restrict. To set the user mask, use the umask command. The syntax is:

umask [mode]

where mode specifies which permissions you want to restrict.

It is a good idea to put a umask command in your login file, so that your user mask will be set automatically each time you log in. Indeed, you will see a umask command in the sample login files I showed you in Chapter 14.

What should your user mask be? Let's consider some examples. To start, let's say you want write permission to be withheld from your group and from everyone else. Use a mode of 022:

umask 022

This user mask shares your files without letting anyone change them. In most cases, it is prudent to be as private as possible. To do so, you can withhold all permissions — read, write, and execute — from your group and from anyone else. Use a mode of 077:

umask 077

To display the current value of your user mask, you can enter the umask command without a parameter:

umask

Note: umask is a builtin command, which means its exact behavior depends on which shell you are using. Some shells do not display leading zeros. For example, if your user mask is 022, you may see 22; if your user mask is 002, you may see 2. If this is the case with your shell, just pretend the zeros are there.

— hint —

Unless you have a good reason to do otherwise, make your files completely private by using umask 077 in your login file. If you want to share, you can use chmod to do so on a file-by-file basis.

Jump to top of page

Wiping Out the Contents
of a File: shred

As we discussed earlier in the chapter, once you delete a file, there is no way to get it back. However, the actual disk space used by the file is not wiped clean. Rather, it is marked as being available for reuse by the filesystem. Eventually, the disk space will be reused and the old data will be overwritten by new data. On a large, busy Unix system, this can happen within seconds. However, there is no guarantee when this will happen, and sometimes old data can stay hidden in the unused part of a disk for some time. Indeed, there are special "undelete" tools that are able to look at the unused portion of a disk and recover old data.

Moreover, even if data is overwritten, in extreme cases it is possible for data to be recovered, as long as the data has not been overwritten more than once. If you can take a hard disk to a lab with very expensive data recovery equipment, it may be possible to sense traces of the old data on the magnetic surface of the disk.

For the truly paranoid, then, the best way to delete data forever is to destroy the storage media, say, by melting it. This is relatively easy with a CD or floppy disk, but not so easy with a hard disk, especially if you want to wipe out a few files and not the entire disk. So for those rare occasions in which simple file deletion is not enough, the GNU utilities (see Chapter 2) provide a program called shred. Although shred is not universally available, you will find it on most Linux systems. The syntax is:

shred -fvuz [file...]

where file is the name of a file.

The goal of shred is to overwrite existing data so many times that even the most expensive data recovery equipment in the world will feel foolish trying to read the magnetic traces. All you need to do is specify the names of one or more files and shred does the work automatically. If you add the -v (verbose) option, as I like to do, shred will display messages as it progresses:

shred -v datafile

By default, shred will overwrite the data many times and will leave the file with random data. Random data, of course, is a tipoff that the file has been "shredded". To hide this, you can use the -z option, which tells shred to finish the job by filling the file with all zeros. Going further, if you want to delete the file after processing, use the -u option. Finally, to override restrictive file permissions, you can use the -f (force) option.

Here then is the ultimate shred command. It will override existing file permissions, wipe out all the data by overwriting it many times, fill the file with all zeros, and delete the remains:

shred -fuvz datafile

The shred program, of course, can only do so much. If a file has been backed up automatically to another system or copied to a mirror site, all the shredding in the world won't get rid of the remote copies. Moreover, shred won't work on all filesystems. For example, when you update a file with the ZFS filesystem (developed by Sun Microsystems), the new data is written to a different location on the disk. The old data is not replaced until that particular part of the disk is reused by another file.

Jump to top of page

The Idea of a Link: stat, ls -i

When Unix creates a file, it does two things. First, it sets aside space on the storage device to store data. Second, it creates a structure called an INDEX NODE or INODE ("I-node") to hold the basic information about the file. The inode contains all the information the filesystem needs to make use of the file. Figure 25-4 contains a summary of what you would find in an inode in a typical Unix filesystem. Ordinary users don't have to know what is in an inode, because the filesystem handles the details automatically. On Linux systems, it is easy to look inside the inode for a particular file by using the stat command. Just type stat followed by the name of a file:

stat filename

Figure 25-4
Contents of an inode (index node)

An index node or inode contains information about a file. Here is a list of the information typically stored in an inode in a Unix filesystem. The exact contents can vary slightly from one filesystem to another.

• Length of file in bytes
• Name of device that contains the file
• Userid of owner
• Groupid
• File permissions
• Last modification time
• Last access time
• Last time inode was changed
• Number of links pointing to the file
• Type of file (ordinary, directory, special, symbolic link...)
• Number of blocks allocated to the file

The filesystem keeps all the inodes in a large table called the INODE TABLE. Within the inode table, each inode is known by a number called the INDEX NUMBER or INUMBER ("I-number"). For example, say that a particular file is described by inode #478515 . We say that the file has an inumber of 478515. To display the inumber for a file, use ls with the -i option. For example, the following command displays the inumber for the two files named xyzzy and plugh(*):

ls -i xyzzy plugh

* Footnote

When you have a moment, look up these two names on the Internet.

The following two commands display inumbers for all the files in the current directory:

ls -i
ls -il

When we work with directories, we talk as if they actually contain files. For example, you might hear someone say that his bin directory contains a file named spacewar. However, the directory does not really contain the file. Actually, the directory only contains the name of the file and its inumber. Thus, the contents of a directory is quite small: just a list of names and, for each name, an inumber.

Let's look at an example. What happens when you create a file named spacewar in your bin directory? First, Unix sets aside storage space on the disk to hold the file. Next, Unix looks in the inode table and finds a free inode. Let's say that it is inode #478515 . Unix fills in the information in the inode that pertains to the new file. Finally, Unix places an entry in the bin directory. This entry contains the name spacewar along with an inumber of 478515 . Whenever a program needs to use the file, it is a simple matter to look up the name in the directory, use the corresponding inumber to find the inode, and then use the information in the inode to access the file.

The connection between a filename and its inode is called a LINK. Conceptually, a link connects a filename with the file itself. This is why — as you can see from Figure 25-4 — the inode does not contain a filename. Indeed, as you will see in a moment, an inode can be referenced by more than one filename.

Jump to top of page

Multiple Links to the Same File

One of the most elegant features of the Unix filesystem is that it allows multiple links to the same file. In other words, a file can be known by more than one name. How can this be? The unique identifier of a file is its inumber, not its name. Thus, there is no reason why more than one filename cannot reference the same inumber. Here is an example.

Let's say that your home directory is /home/harley. Within your home directory, you have a subdirectory called bin. You have created a file in the bin directory by the name of spacewar. It happens that this file has an inumber of 478515 . Using the ln command (described later in the chapter), you create another file named funky in the same directory, such that it has the same inumber as spacewar. Since both spacewar and funky have the same inumber, they are, essentially, different names for the same file.

Now, let's say you move to your home directory and create another file named extra, also with the same inumber. Then you move to the home directory of a friend, /home/weedly, and create a fourth file named myfile, also with the same inumber. At this point, you still have only one file — the one identified by inumber 478515 — but it has four different names:

/home/harley/bin/spacewar
/home/harley/bin/funky
/home/harley/extra
/home/weedly/myfile

Why would you want to do this? Once you get used to the idea of links, you will find ample opportunities to use them. The basic idea — which will make more sense as you become more experienced and sophisticated — is that the same file can have different meanings, depending upon the context in which it is being used. For example, it is often handy to allow different users to access the same file under different names.

However, there is a lot more to the ideas behind links. The reason I want you to understand how they work is because it is the links that underlie the operation of the basic file commands: cp (copy), mv (move), rm (remove), and ln (link). If all you do is memorize how to use the commands, you will never really understand what is happening, and the rules for using the filesystem will never really make sense.

In a moment we will consider the implications of this statement. Before we do, I want you to consider a question. Let's say that a file has more than one link; that is, the file can be accessed by more than one name. Which of the names is the most important one? Does the original name have any special significance? The answer is that Unix treats all links as equal. It doesn't matter what the original name of the file was. A new link is considered to be just as important as the old one.

Within Unix, files are not controlled by their names or locations. Files are controlled by ownership and permissions.

Jump to top of page

Creating a New Link: ln

Whenever you create a file, the filesystem creates a link between the filename and the file automatically. However, there will be times you want to make a new link to an existing file. To do so, you use the ln (link) command. There are two forms of this command. First, to make a new link to a single file, use the syntax:

ln file newname

where file is the name of an existing ordinary file, and newname is the name you want to give the link.

For example, let's say you have a file named spacewar, and you want to make a new link with the name funky, use:

ln spacewar funky

You will end up with two filenames, each of which refers to the same file (that is, to the same inumber). Once a new link is created, it is functionally the same as the original directory entry.

The second way to use ln is to make new links for one or more ordinary files and place them in a specified directory. The syntax is:

ln file... directory

where file the name of an existing ordinary file, and directory is the name of the directory in which you want to place the new links.

Here is an example. Your home directory is /home/harley. In this directory, you have two files, data1 and data2. Your friend uses the home directory /home/weedly. In this directory, he has a subdirectory named work. The file permissions for this directory are such that your userid is allowed to create files. You want to make links to your two files and place them in your friend's directory. Use the command:

ln /home/harley/data1 /home/harley/data2 /home/weedly/work

To simplify the command, you can use wildcards (see Chapter 24):

ln /home/harley/data[12] /home/weedly/work

Another way to simplify this command is to change to your home directory before entering the ln command:

cd; ln data[12] /home/weedly/work

Once you create these new links, both files have names in two different directories at the same time. To see the number of links for a file, use the ls -l command. The number of links is displayed between the file permissions and the userid of the owner. For example, let's say you enter the command:

ls -l music videos

The output is:

-rw------- 1 harley staff  4070 Oct 14 09:50 music
-rwx------ 2 harley staff 81920 Oct 14 09:49 videos

You can see that music has only one link, while videos has two links.

Jump to top of page

How the Basic File Commands Work

It is important that you be able to understand the basic file commands in terms of filenames and links. Here are the basic operations:

1. CREATE A FILE; CREATE A DIRECTORY [mkdir]

To create a new file or directory, Unix sets aside storage space and builds an inode. Within the appropriate directory, Unix places a new entry using the filename or directory name you specified, along with the inumber of the new inode.

2. COPY A FILE [cp]

To copy to an existing file, Unix replaces the contents of the target file with the contents of the source file. No inumbers change. To copy to a file that does not exist, Unix first creates a brand new file with its own inumber. (Remember, the inumber is really what identifies a file.) The contents of the old file are then copied to the new file. After the copy, there are two distinct files. The old filename corresponds to the old inumber; the new filename corresponds to the new inumber.

3. RENAME A FILE or MOVE A FILE [mv]

To rename or move a file, Unix changes the filename, or moves the directory entry, or both, but keeps the same inumber. This is why the same command (mv) is used to rename and move.

4. CREATE A LINK [ln]

To create a new link to an existing file, Unix makes a new directory entry using the filename you specify, pointing to the same inumber as the original file. There is now one file and two filenames, and both filenames point to the same inumber.

5. REMOVE A LINK [rm, rmdir]

When you remove a link, Unix eliminates the connection between the filename and the inumber by removing the directory entry. If there are no more links, Unix deletes the file.

It is important to understand that removing a link is not the same as deleting a file. If there is more than one link to a file, Unix will not delete the file until the last link is removed. In most cases, however, there is only one link to a file, which is why, most of the time, rm and rmdir act as delete commands.

Here is a simple example to illustrate the ideas we have just discussed. You have a file named spacewar. You decide to make a new link to this file and call it funky:

ln spacewar funky

Now you remove spacewar:

rm spacewar

Even though the first filename is gone, the original file still exists. The file itself will not be deleted until the last link (funky) is removed.

Jump to top of page

Symbolic Links: ln -s

The type of links we have discussed enable us to have more than one name refer to the same file. However, such links have two limitations. First, you cannot create a link to a directory. Second, you cannot create a link to a file in a different filesystem.

To create a link to a directory or to a file in a different filesystem, you need to create what is called a SYMBOLIC LINK or a SYMLINK. To do so, you use the ln program with the -s option. A symbolic link does not contain the inumber of a file. Rather, it contains the pathname of the original file. Whenever you access a symbolic link, Unix uses that pathname to find the file. (In this sense, a symbolic link is similar to a Windows shortcut. Windows Vista, by the way, supports actual Unix-like symbolic links.)

When you use ls -l to display the long listing for a file that is a symbolic link, you will notice two things. First, the file type indicator (the leftmost character of the output) will be the lowercase letter l for "link". Second, the actual symbolic link is shown at the right side of the line. Here is an example from a system in which the file /bin/sh is a symbolic link to the file /bin/bash. You enter the command:

ls -l /bin/sh

The output is:

lrwxrwxrwx  1 root root 4 Sep 11 2008  /bin/sh -> bash

As you can see, this file is only 4 bytes long, just long enough to hold the pathname of the real file (which is 4 characters long). The fact that this is a symbolic link — and not, say, a 4-character ordinary file — is noted in the inode for the file.

If you want to see the long listing for the file itself, you must specify the actual name:

ls -l /bin/bash

In this case, the output is:

-rwxr-xr-x 1 root root 720888 Feb 10 2008  /bin/bash

As you can see, this particular file has 720,888 bytes. As you might have guessed from the name, the file holds the program for the Bash shell.

To distinguish between the two types of links, a regular link is sometimes called a HARD LINK, while a symbolic link is sometimes called a SOFT LINK. When we use the word "link" by itself, we mean a hard link.

As we discussed earlier, to display the number of hard links to a file you use the ls -l command. There is, however, no way to display how many soft links (symbolic links) there are to a file. This is because the filesystem itself doesn't even know how many such links exist.

Question: What happens if there exists a symbolic link to a file, and you delete the file?

Answer: The symbolic link will not be deleted. In fact, you can still list it with ls. However, if you try to use the link, you will get an error message.

Jump to top of page

Using Symbolic Links With Directories

In Chapter 24, we discussed how to use the builtin commands cd to change the working directory and pwd to display the name of the working directory. A question arises: How should cd and pwd behave when a directory name is a symbolic link to another directory? There are two choices. First, the command can consider the symbolic link to be an entity in its own right, a synonym for the actual directory, much like a hard link is for an ordinary file. Alternatively, the link might be nothing more than a stepping stone to the real directory.

With some shells, the cd has two options to give you control over such situations. The -L (logical) option tells cd to treat symbolic links as if they were real directories on their own. The -P (physical) option tells cd to substitute the real directory for the symbolic one. Here is an example.

To start, within your home directory, create a subdirectory named extra. Next, create a symbolic link to this directory and name it backups. Finally, display a long listing of the two files:

cd
mkdir extra
ln -s extra backups
ls -ld extra backups

Here is some sample output:

lrwxrwxrwx 1 harley staff    5 Sep 8 17:52 backups -> extra
drwxrwxr-x 2 harley staff 4096 Sep 8 17:52 extra

By looking at the first character of each line, we can see that backups is a link and extra is a real directory. Notice that backups is only 5 bytes long, just long enough to contain the name of the target directory. The extra directory, however, is 4,096 bytes long, the block size for this filesystem (see Chapter 24).

Now, consider the command:

cd -L backups

This changes the working directory to backups even though, strictly speaking, backups doesn't really exist. What if, instead, we had used the -P option?

cd -P backups

In this case, the shell would have substituted the actual directory for the symbolic link and our working directory would become extra. When this happens, we say that the shell FOLLOWS the link.

By default, cd assumes -L, so you never really have to specify it. However, if you want the shell to follow a symbolic link, you do have to specify -P.

The pwd (print working directory) command has the same two options that can be applied when you display the name of your working directory. To test this, create the directory and symbolic link above and enter either of the following commands (they are equivalent):

cd backups
cd -L backups

Now enter the command:

pwd -P

The -P option tells pwd to follow the link. The output will look something like this:

/home/harley/extra

Now enter either of the following commands. As with cd, the -L option is the default:

pwd
pwd -L

In this case, pwd does not follow the link, so the output is:

/home/harley/backups

Jump to top of page

Finding Files Associated With
a Unix Command: whereis

There will be many instances when you want to find a particular file or set of files. At such times, there are three different programs you can use: whereis, locate, and find. In the next several sections, we will cover each program in turn.

The whereis program is used to find the files associated with a specific Unix command: binary (executable) files, source files, and documentation files. Rather than search the entire filesystem, whereis looks only in those directories in which such files are likely to be found: /bin, /sbin, /etc, /usr/share/man, and so on. (See the description of the Filesystem Hierarchy Standard in Chapter 23.)

The syntax for the whereis program is:

whereis [-bms] command...

where command is the name of a command.

Let's say you want to find the files associated with the ls command. Just enter:

whereis ls

Here is some typical output:

ls: /bin/ls /usr/share/man/man1p/ls.1p.gz /usr/share/man/man1/ls.1.gz

In this case, we see that the ls program itself resides in the file with the pathname /bin/ls. This is straightforward. The next two long pathnames show the location of two different man pages in compressed format:

/usr/share/man/man1p/ls.1p.gz
/usr/share/man/man1/ls.1.gz

(The .gz extension indicates that a file has been compressed by the gzip program. Such files are uncompressed automatically before they are displayed.)

The two lines above show that the first man page is in section 1p, and the second is in section 1. You can display either of these pages by using the man command (see Chapter 9). Since section 1 is the default, you don't have to specify it. However, you do have to specify any other section, such as 1p. Thus, the commands used to display these pages are:

man ls
man 1p ls

This example illustrates one of the things I like about whereis: it will sometimes find man pages you had no idea existed. For example, on the system used to generate the above output, it happens that there are two different man pages for the ls program. Indeed, they are different enough that it is worth reading the both. However, if we hadn't checked with whereis, we would not have known about the second page.

If you want to limit the output of whereis, there are several options you can use. To display only the pathname of the executable file, use the -b (binary) option; for files from the online manual, use the -m option; and for source files use the -s option. Here is an example. The following command displays the pathnames for the executable files for ten different programs:

whereis -b chmod cp id ln ls mv rm shred stat touch

Try this command on your system and see what you get.

Jump to top of page

Finding Files by Searching
a Database: locate

There are two different Unix programs that provide a general "file finding" service: locate and find. I want you to learn how to use them both. The find program is much older and much more difficult to use. However, it is very powerful and is available on every Unix and Linux system. The locate program is newer and easy to use, but less powerful than find. Moreover, although it comes with most Linux and FreeBSD systems, it is not available on all Unix systems (for example Solaris). In this section, we'll cover locate. In the following sections, we'll discuss find.

The job of the locate program is to search a special database containing the pathnames of all publicly accessible files, looking for all the names that contain a specific pattern. The database is maintained automatically and updated regularly. The syntax for locate is:

locate [-bcirS] pattern...

where pattern is the pattern you are looking for in a pathname.

Here is a simple example. You want to find all the files whose pathname contains the characters "test". Since there will probably be many such files, it is a good idea to pipe the output of the command to less (Chapter 12) to display the output one screenful at a time:

locate test | less

Aside from ordinary characters, you can use a regular expression if you include the -r option. Within a regular expression, you can use ^ and $ to anchor the beginning and end of the pathname respectively. (For help with regular expressions, see Chapter 20.)

Here is an interesting example. You want to search your system for photos. One way is to look for files with an extension of either .jpg or .png, and save the pathnames in a file. You can then browse the file at your leisure. The commands to use are:

locate -r '.jpg$' > photos
locate -r '.png$' >> photos

The first command redirects its output to the file photos. The second command appends its output to the same file. (For a discussion of redirecting output, see Chapter 15.)

Because locate often gives you more than you want, it is common to process the output in some way. One of the most powerful combinations is to pipe the output of locate to grep. For example, let's say you are using a new system and want to find the Unix dictionary file (see Chapter 20). Most likely, this file will contain the letters "dict" as well as the letters "words". To find all such files, use locate to find all the files that contain "dict". Then use grep to search the output of locate for lines that contain "words". The command is:

locate dict | grep words

Try it on your system and see what you get.

To modify the operation of locate, there are several options you can use. First, the -c (count) option displays the total number of files that are matched, instead of displaying the actual filenames. For example, if you want to find out how many JPG files are on your system, use the command:

locate -cr '.jpg$'

The next option is -i (ignore case). This tells locate to treat upper- and lowercase letters as being the same. For example, if you want to search for all the files whose pathnames contain either "x11" or "X11", you can use either of the following commands:

locate x11 X11
locate -i x11

If you want, you can combine -i with -r to use case-insensitive regular expressions. For example, here is how to search for files whose pathnames start with "/usr" and end with "x11" or "X11":

locate -ir '^/usr*x11$'

You will often find that it is convenient to match only the last part of the pathname, what we have called the filename or basename (see Chapter 24). To do so, use the -b option. For example, to find all the files whose basenames contain the letters "temp", use:

locate -b temp

To find all the files whose name consists only of the letters "temp", use:

locate -br '^temp$'

Finally, to display information about the locate database on your system, use the -S (statistics) option:

locate -S

As you can see, locate is easy to use. Just tell it what you want, and locate will find it. However, there is one drawback I want to make sure you understand.

I mentioned earlier that locate uses a special database that contains the pathnames of all the publicly available files. On a well-run system, this database is updated automatically at regular intervals. However, when you or anyone else creates a new file, it will not appear in the database until the next update. To get around this limitation, you can use find (discussed in the next section), because find actually searches the directory tree.

Jump to top of page

Finding Files by Searching
a Directory Tree: find

So far, we have discussed two different tools that find files: whereis and locate. Both of these programs are fast and easy to use and, most of the time, they should be your first choice when you are searching for a file.

However, there are limitations. The whereis program searches only for files associated with a particular program (executable files, source files, documentation files). The locate program doesn't actually perform a search. It simply looks for pattern matches in a database that contains the pathnames of all the publicly accessible files on the system. When you want to do a full search on demand, you need to use find.

The find program is the oldest and most complex of the three programs. Indeed, find is one of the most complex Unix tools you will ever use. However, it has three important advantages over the other programs. First, it is very powerful: find can search for any file, anywhere, according to a large variety of criteria. Second, once find has completed a search, it can process the results in several different ways. Finally, unlike locate, find is available on all Unix and Linux systems, so you can use it on any system you encounter.

The full syntax for find is very complicated. In fact, I won't even show it to you. Instead, we'll start with an overview and move ahead one step at a time.

The general idea is that find searches one or more directory trees for files that meet certain criteria, according to tests that you specify. Once the search is complete, find performs an action on the files it has found. The action can be as simple as displaying the names of the files. The action can also be more complex: find can delete the files, display information about them, or pass the files to another command for further processing.

To run find, you specify three things (in this order): directory paths, tests, and actions. The general syntax is:

find path... test... action...

Once you enter the command, find follows a 3-step process:

1. Path: The first thing find does is look at each path, examining the entire directory tree it represents, including all subdirectories.

2. Test: For each file find encounters, it applies the tests you specified. The goal is to create a list of all the files that meet your criteria.

3. Action: Once the search is complete, find carries out the actions you specified on each file in the list.

Consider the following simple example:

find /home/harley -name important -print

Without going into the details just yet, let's break this command into parts:

Path: /home/harley
Test: -name important
Action: -print

Within this command, we give find the following instructions:

1. Path: Starting from /home/harley, search all files and subdirectories.

2. Test: For each file, apply the test -name important. (The meaning of this test is to look for files named important.)

3. Action: For each file that passed the test, perform the action -print (that is, display the pathname).

So what does the above command do? It displays the pathnames of all the files named important within the /home/harley directory tree.

Take a moment to reflect on how we analyzed the above command. You will find that any find command — no matter how complex — can be analyzed in the same way, by breaking it into three parts: paths, tests, actions. Conversely, when you need to construct a complicated find command, you can build it up by thinking about these three parts, one after another.

Jump to top of page

The find Command: Paths

As we have discussed, the general format of the find program is:

find path... test... action...

As you can see, the beginning of every find command consists of one or more paths. These paths show find where to search. Specifying paths is straightforward, as you will see from the following examples. (When you read them, realize that these examples do not specify any tests or actions. We'll talk about these topics in a moment.) Most of the time, you will use only a single path so, to start, here is a simple example:

find backups

In this example, we tell find to start with the directory named backups and search though all its descendents, both files and subdirectories. As you can see, we have used a relative pathname. You can also use absolute pathnames, a . (dot) for the working directory, or a ~ (tilde) for a home directory. Here are a few more partial commands:

find /usr/bin
find /
find .
find ~
find ~weedly

The first example tells find to start searching from the directory /usr/bin. The second example searches from the root directory. (Effectively, this tells find to search the entire filesystem.) The next example searches from the working directory. The fourth example searches from the home directory. The final example searches from the home directory for userid weedly. (For a complete discussion of how to specify pathnames, see Chapter 24.)

If you want, you can specify more than one path for file to search, for example:

find /bin /sbin /usr/bin ~harley/bin

In this example, find will search four separate, directory trees. The search results will be processed together as one long list.

Jump to top of page

The find Command: Tests

We use the find program to search one or more directory trees, look for files that meet specified criteria, and then perform certain actions on those files. To define the criteria, we specify one or more TESTS. The general format of the command is:

find path... test... action...

So far, learning about find has been fairly easy. This is where it gets complicated. In the previous section, we discussed how to specify paths. In this section, we will discuss how to use the various tests to specify which files you want to process. There are many different tests, ranging from simple to arcane. As a reference, I have summarized the most important tests in Figure 25-5.

-- Harley
Figure 25-5
The find program: Tests

The find program searches directory trees looking for files that meet specific criteria, according to various tests. See text for details.

Filenames
-name patternfilename matches pattern
-iname patternfilename matches pattern (case insensitive)
File Characteristics
-type [df]type of file: d = directory, f = ordinary file
-perm modefile permissions are set to mode
-user useridowner is userid
-group groupidgroup is groupid
-size [-+]n[cbkMG]size is n [chars(bytes), blocks, kilobytes, megabytes, gigabytes]
-emptyempty file (size = 0)
Access Times, Modification Times
-amin [-+]naccessed n minutes ago
-anewer fileaccessed more recently than file
-atime [-+]naccessed n days ago
-cmin [-+]nstatus changed n minutes ago
-cnewer filestatus changed more recently than file
-ctime [-+]nstatus changed n days ago
-mmin [-+]nmodified n minutes ago
-mtime [-+]nmodified n days ago
-newer file modified more recently than file
 

The find program is a very old tool, and the basic tests are the same from one system to another. However, the newer versions of find support some tests that are not available on all systems. Figure 25-5 summarize the tests that you can use with the version of find that is part of the GNU utilities, the program that comes with Linux (see Chapter 2). If you use a different type of Unix, all the basic tests will work, but some of the more esoteric ones may not be supported. To see a complete list of the available tests for your version of find, check the man page on your system (man find).

As you can see from Figure 25-5, find has a lot of different tests. Eventually, you can learn them all as the need arises. For now, my goal is to make sure you understand the most important tests. By far, the two most important tests are -type and -name, so we'll start with those.

The -type test controls which types of files find should look at. The syntax is -type followed by a one-letter designation. Most commonly, you will use either f for ordinary files or d for directories. If necessary, you can also use b (block devices), c (character devices), p (named pipe), or l (symbolic link). Here are some examples. (Note: In these examples, I have used the action -print, which simply displays the results of the search. We'll discuss other, more complicated actions later in the chapter.)


find /etc -type d -print
find /etc -type f -print
find /etc -print

All three commands perform a search starting from the /etc directory. The first command searches only for ordinary files; the second searches only for directories; the third searches for any type of file.

The -name test tells find to look for files whose names match a specified pattern. If you want, you can use the standard wildcards *, ? and [ ] (see Chapter 24). If you do, however, you must quote them, so they are passed to find and not interpreted by the shell. Here are some examples. All three commands start searching from the working directory (.) and search only for ordinary files (-type f):

find . -type f -name important -print
find . -type f -name '*.c' -print
find . -type f -name 'data[123]' -print

The first command searches for files named important. The second command searches for filenames with the extension .c, that is, C source files. The third command searches only for files named data1, data2 or data3.

— hint —

The most common mistake I see people make with find is to forget to quote wildcards when using -name. If you do not quote wildcards, the shell will interpret them itself, resulting in an error. Consider the following two commands:

find . -type f -name '*.c' -print
find . -type f -name *.c -print

The first command will work fine. The second command, however, may not work because the shell will change the expression *.c to an actual set of filenames, causing a syntax error. You will see an error message like the following:

find: paths must precede expression

Like most of Unix, the -name test is case sensitive, that is, it distinguishes between upper- and lowercase. To ignore differences in case, use -iname instead. Consider, for instance, the following two examples. Both commands start searching from /usr and look only for directories:

find /usr -type d -name bin -print
find /usr -type d -iname bin -print

The second command looks only for directories named bin. The second command uses -iname, which means it will match directories named bin, Bin, BIN, and so on.

Aside from names, you can select files based on a variety of other characteristics. We have already seen the -type test, which selects a particular type of file, usually f for ordinary files or d for directories. You can also use -perm to search for files with a specific mode, and -user or -group for files owned by a specific userid or groupid. Consider these three examples, which start searching from your home directory:

find ~ -type d -perm 700 -print
find ~ -type f -user harley -print
find ~ -type f -group staff -print

The first command searches for directories that have a file mode of 700. (We discussed permissions and modes earlier in the chapter.) The second command searches for ordinary files owned by userid harley. The final command searches for ordinary files whose groupid is staff.

You can also search for files according to their size, by using -size followed by a specific value. The basic format is a number followed by a one-letter abbreviation. The abbreviations are: c for characters (that is, bytes), b for 512-byte blocks, k for kilobytes, M for megabytes, and G for gigabytes. Here are two examples, both of which search for ordinary files starting from your home directory:

find ~ -type f -size 1b -print
find ~ -type f -size 100c -print

The first command searches for files that are exactly 1 block in size. Since this is the minimum size for a file, this command effectively finds all your small files. The second command searches for files containing exactly 100 bytes.

Before we move on, I want to take a moment to explain an important point. When you use a size measured in blocks, kilobytes, megabytes, or gigabytes, find assumes you are talking about disk space. This is why -size 1b finds all your small files. As we discussed in Chapter 24, the minimum allocation of disk space is 1 block.

When you use a size measured in bytes, find assumes you are talking about the actual content of the files, not how much disk space it uses. That is why -size 100c looks for files that contain exactly 100 bytes of data. In fact, a file containing 100 bytes of data will be found by both -size 100c and -size 1b.

Whenever you use -size, you can preface the number with either a - (minus) or + (plus) to mean "less than" or "greater than" respectively. (This is a general rule for all numbers used with tests.) For example, the following command finds all your personal files with a size of less than 10 kilobytes. The second command finds all your files with a size greater than 1 megabyte:

find ~ -type f -size -10k -print
find ~ -type f -size +1M -print

The final group of tests allow you to search for files based on their access or modification times. The tests are summarized in Figure 25-5, so I won't go over each one in detail. Let me just give you a few examples. Suppose you want to find all of your files that have been modified in the last 30 minutes. Use the -mmin test with the value -30, for example:

find ~ -mmin -30 -print

Let's say you want to find files that have not been used for over 180 days. Use -atime with the value +180:

find ~ -atime +180 -print

Finally, to find all your files that have been changed in the last 10 minutes, use:

find ~ -cmin -10 -print

Jump to top of page

The find Command:
Negating a Test With the ! Operator

When necessary, you can negate a test by preceding it with the ! (exclamation mark) OPERATOR. To do so, just type ! before the test. When you use !, you must follow two rules. First, you must put a space on each side of the ! mark, so it can be parsed properly. Second, you must quote the !, so it is passed to find and not interpreted by the shell. (This is a good habit anytime you want to pass metacharacters to a program.)

As an example, consider the following command that searches from your home directory and displays the names of all ordinary files that have the extension .jpg:

find ~ -type f -name '*.jpg' -print

Suppose, instead, we want to display the names of those files that do not have the extension .jpg. All we have to do is use the ! operator to reverse the test:

find ~ -type f \! -name '*.jpg' -print

You will notice that I used a backslash to quote the ! operator. If you prefer you can use single quotes instead. (For more information about quoting, see Chapter 13.)

find ~ -type f '!' -name '*.jpg' -print

When necessary, you can negate more than one test. Just make sure that each test has its own ! operator. For example, suppose you want to see if you have any files that are neither ordinary files or directories. That is, you want to see if you have any symbolic links, named pipes, special files, and so on. The command to use is:

find ~ \! -type f \! -type d -print

Jump to top of page

The find Command:
Dealing With File Permission Error Messages

The find program is a great tool for searching through large directory trees. In particular, by starting at / (the root directory), you can search the entire filesystem. However, when you search outside your own home directory area, you will find that some directories and files are off-limits, because your userid does not have permission to access them. Each time this happens, find will display an error message.

For example, the following command searches the entire filesystem looking for directories named bin:

find / -type d -name bin -print

When you run this command, you will probably see many error messages similar to the following:

find: /etc/cron.d: Permission denied

In most cases, there is no reason to see these messages, because they don't really help you. In fact, all they do is clutter your output. So how can you get rid of them?

Since error messages are written to standard error (see Chapter 15), you can get rid of the messages by redirecting standard error to /dev/null, the bit bucket. With the Bourne Shell family (Bash, Korn shell), this is easy:

find / -type d -name bin -print 2> /dev/null

With the C-Shell family (C-Shell, Tcsh), it is a bit more complicated, but it can be done:

(find / -type d -name bin -print > /dev/tty) >& /dev/null

The details are explained in Chapter 15.

Jump to top of page

The find Command: Actions

As we have discussed, we use the find program to search one or more directory trees, look for files that meet specified criteria, and then perform certain actions on those files. The general format of the command is:

find path... test... action...

So far, we have talked about how to specify paths and tests. To conclude our discussion, we will now talk about actions. An ACTION tells find what to do with the files it finds. For reference, I have summarized the most important actions in Figure 25-6. (For a complete list, see the find man page on your system.)

Figure 25-6
The find program: Actions

The find program searches directory trees, looking for files that meet criteria, according to tests that you specify. For each file that is found, find performs the actions you specify. See text for details.

-printwrite pathname to standard output
-fprint filesame as -print; write output to file
-lsdisplay long directory listing
-fls filesame as -ls; write output to file
-deletedelete file
-exec command {} \;execute command, {} indicates matched filenames
-ok command {} \;same as -exec, but confirm before running command
 

As you can see, actions start with a - (dash) character, just like tests. The most commonly used action is -print, which tells find to display the pathnames of all the files it has selected. More precisely, -print tells find to write the list of pathnames to standard output. (Why the name -print? As we discussed in Chapter 7, for historical reasons, it is a Unix convention to use the word "print" to mean "display".)

Here is a simple, straightforward example that starts from the working directory and searches for files named important:

find . -name important -print

With most versions of find, if you do not specify an action, -print is assumed by default. Thus, the following two commands are equivalent:

find . -name important -print
find . -name important

Similarly, with the GNU version of find, if you do not specify a path, the working directory is assumed by default. Thus, if you are a Linux user, the following three commands are equivalent:

find . -name important -print
find . -name important
find -name important

Moving on, let me show you a more useful example. You want to search the entire filesystem for music files in MP3 format. To do so, you use find to start from the root directory and search for all the ordinary files that have an extension of .mp3. Since you will be searching throughout the filesystem, you know that find will generate spurious file permission error messages (see the previous section). For this reason, you redirect standard error to the bit bucket. The full command is:

find / -type f -name '*.mp3' -print 2> /dev/null

If the list of MP3 files is long, most of it will scroll off your screen. If so, you have two choices. First, you can pipe the output to less to display one screenful at a time:

find / -type f -name '*.mp3' -print 2> /dev/null | less

Alternatively, you can save the output in a file, so you can peruse it later at your leisure. To do so, use the -fprint action instead of -print. The syntax is simple: just type -fprint followed by the name of the file. For example, the following command finds the names of all the MP3 files on the system and stores them in a file called musiclist:

find / -type f -name '*.mp3' -fprint musiclist 2> /dev/null

The -print action displays pathnames. If you want more information about each file, you can use the -ls action instead. This action displays information similar to the ls command with the option -dils. The formatting is done internally. (find doesn't really run the ls program) What you will see is:

• Inode number
• Size in blocks
• File permissions
• Number of hard links
• Owner
• Group
• Size in bytes
• Time of last modification
• Pathname

As an example, the following command searches from your home directory for all files and directories that have been modified in the last 10 minutes. It then uses -ls to display information about these files:

find ~ -mmin -10 -ls

The -fls action is similar to -ls except that, like -fprint, it writes its output to a file. For example, the following command finds all your files that have been modified in the last 10 minutes and writes their pathnames to a file named recent:

find ~ -mmin -10 -fls recent

The next action -delete can be very useful. However, it can easily turn around and bite you, so be careful how you use it. The -delete action removes the link to every file that has been found in the search. If that link is the only one, the file is deleted (see the discussion on links earlier in the chapter). In other words, using -delete is similar to using the rm command.

Here is an example. Starting the search from your home directory, you want to remove all the files with the extension .backup:

find ~ -name '*.backup' -delete

Here is a more complex (and more useful) example. The following command removes all the files with the extension .backup that have not been accessed for at least 180 days:

find ~ -name '*.backup' -atime +180 -delete

Using the -ls and -delete actions is similar to sending the output of a search to the ls and rm programs respectively. However, find offers much more generality: you can send the search output to any program you want by using the -exec action. The syntax is:

-exec command {} \;

where command is any command you want, including options and parameters.

Here is how it works. You type -exec followed by a command. You can specify any command you want, as well as options and parameters, just as if you were typing it on the command line. Within the command, you use the characters {} to represent a pathname found by find. To indicate the end of the command, you must end it with a ; (semicolon). As you can see from the syntax above, the semicolon must be quoted. This ensures that it is passed to find and not interpreted by the shell. (Quoting is explained in Chapter 13.)

The -exec action is, more than any other feature, what makes find so powerful, so it is crucial that you learn how to use it well. To show you how it works, let's start with a trivial example. The command below starts searching from your home directory, looking for all your directories. The results of the search are sent to the echo program, one at a time, to be displayed.

find ~ -type d -exec echo {} \;

With some shells, you will have a problem if you don't quote the brace brackets:

find ~ -type d -exec echo '{}' \;

As a quick aside, let me remind you that you can quote the semicolon with single quotes instead of a backslash if you want:

find ~ -type d -exec echo {} ';'
find ~ -type d -exec echo '{}' ';'

An -exec action is carried out once for each item that is found during the search. In this case, if there are, say, 26 directories, find will execute the echo command 26 times. Each time echo is executed, the {} characters will be replaced by the pathname of a different directory.

The reason I call the previous command a trivial example is that you don't really need to send pathnames to echo to display them. You can use -print instead:

find ~ -type d -print

However, our -exec action can be much more powerful. Let's say that, for security reasons, you decide no one but you should be allowed to access your directories. To implement this policy, you need to use chmod to set the file permissions for all your directories to 700. (See the discussion of file permissions earlier in the chapter.) You could type a chmod for each directory, which would take a long time. Instead, use find to search for your directories and let -exec do the work for you:

find ~ -type d -exec chmod 700 {} \;

One more example. You are using a version of find that does not support the -ls or -delete actions. How can you replace them? Just use -exec to run ls or rm, for example:

find ~ -name '*.backup' -exec ls -dils {} \;
find ~ -name '*.backup' -exec rm {} \;

For more control, there is a variation of -exec that let's you decide which command should be executed. Use -ok instead of -exec, and you will be asked to confirm each command before it is executed:

find ~ -type d -ok chmod 700 {} \;

Jump to top of page

Processing Files That
Have Been Found: xargs

When you use find to search for files that meet a specific criteria, there are two ways to process what you find. First, you can use the -exec action as described in the previous section. This allows you to use any program you want to process each file. However, you must understand that -exec generates a separate command for each file. For example, if your search finds 57 files, -exec will generate 57 separate commands.

For simple commands that operate on a small number of files, this may be okay. However, when a search yields a large number of files, there is a better alternative. Instead of using -exec, you can pipe the output of find to a special program designed to work efficiently in such situations. This program, named xargs ("X-args"), runs any command you specify using arguments that are passed to it via standard input. The syntax is:

xargs [-prt] [-istring] [command [argument...]]

where command is the command you want to run; string is a placeholder; and argument is an argument read from standard input.

Let's start with a simple example. You want to create a list of all your ordinary files showing how much disk space is used by each file. You tell find to start from your home directory and search for ordinary files. The output is piped to xargs which runs the command ls -s (Chapter 24) to display the size of each file. The whole thing looks like this:

find ~ -type f | xargs ls -s

Here is a more complex example that should prove valuable in your everyday life. You are a beautiful, intelligent woman with excellent taste, and you want to phone me to tell me how much you enjoy this book. You remember that, over a year ago, a mutual friend sent you an email message containing my name and number, and that you saved it to a file. You haven't looked at the file since, and you can't even remember its name or what directory it was in. How can you find the phone number?

The solution is to use find to compile a list of all your ordinary files that were last modified more than 365 days ago. Pipe the output to xargs and use grep (Chapter 19) to search all the files for lines containing the character string "Harley Hahn". Here is the command to use:

find ~ -type f -mtime +365 | xargs grep "Harley Hahn"

In this case, the phone number is found in a file named important in your home directory. The output is:

/home/linda/important: Harley Hahn (202) 456-1111

When you need to process output from find, the echo command can be surprisingly useful. As you remember from Chapter 12, echo evaluates its arguments and writes them to standard output. If you give it a list of pathnames, echo outputs one long line containing all the names. This line can then be piped to another program for further processing.

As an example, let's say you want to count how many ordinary files and directories you have. You use two separate commands, one to count the files, the other to count the directories:

find ~ -type f | xargs echo | wc -w
find ~ -type d | xargs echo | wc -w

Both commands use find to search from your home directory. The first command uses -type f to search only for ordinary files; the second command uses -type d to search only for directories. Both commands pipe the output of find to xargs, which feeds it to echo. The output of the echo command is then piped to wc -w (Chapter 18) to count the number of words, that is, the number of pathnames.

From time to time, you will want to be able to use the arguments sent to xargs more than once in the same command. To do so, you use the -i (insert) option. This allows you to use {} as a placeholder that will be replaced with the arguments before the command is run. Let me start with a simple example to show you how it works.

Consider the following two commands, both of which search for ordinary files starting from your working directory:

find . -type f | xargs echo
find . -type f | xargs -i echo {} {}

In the first command, echo writes a single copy of its arguments to standard output. In the second command, echo writes two copies of its arguments to standard output. Here is a more useful example that moves all the files in your working directory to another directory, renaming the files as they move.

find . -type f | xargs -i mv {} ~/backups/{}.old

This command uses find to search your working directory for ordinary files. The list of files is piped to xargs, which runs an mv command (explained earlier in the chapter). The mv command moves each file to a subdirectory named backups, which lies within your home directory. As part of the move, the extension .old is appended to each filename.

Let's say your working directory contains three ordinary files: a, b and c. After the above command is run, the working directory will be empty, and the backups directory will contain a.old, b.old and c.old.

If you want to use -i but, for some reason, you don't want to use {}, you can specify your own placeholder. Just type it directly after the -i, for example:

find . -type f | xargs -iXX mv XX ~/backups/XX.old

When you write such commands, it is possible to create unexpected problems, because you can't see what is happening. If you anticipate a problem, use the -p (prompt) option. This tells xargs to show you each command as it is generated, and to ask your permission before running it. If you type a reply that begins with "y" or "Y", xargs runs the command. If you type any other answer — such as pressing the <Return> key — xargs skips the command.

Here is an example you can try for yourself. Use touch (explained earlier in the chapter) to create several new files in your working directory:

touch a b c d e

Now enter the following command to add the extension .junk to the end of each of these filenames. Use -p so xargs will ask your approval for each file:

find . -name '[abcde]' | xargs -i -p mv {} {}.junk

If you want to see which commands are generated but you don't need to be asked for approval, use -t. This causes xargs to display each command as it is run. You can think of -t as meaning "tell me what you are doing":

find . -name '[abcde]' | xargs -i -t mv {} {}.junk

Important: Be sure not to group -i with any other options. For example, the following command will not work properly, because xargs will think that the p following -i is a placeholder, not an option:

find . -name '[abcde]' | xargs -ip mv {} {}.junk

The last option I want you to know about is -r. By default, xargs always runs the specified command at least once. The -r option tells xargs not to run the command if there are no input arguments. For example, the following command searches your working directory and displays a long listing of any files that are empty:

find . -empty | xargs ls -l

Suppose, however, there are no empty files. The ls -l command will run with no arguments. This will produce a long listing of the entire directory, which is not what you want. The solution is to use the -r option:

find . -empty | xargs -r ls -l

Now xargs will only run the ls command if there are arguments.

As you can see, connecting find and xargs is an easy way to build powerful tools. However, I don't want to give you the impression that xargs is only used with find. In fact, xargs can be used with any program that can supply it with character strings to use as arguments. Here are some examples.

You are writing a shell script, and you want to use whoami (Chapter 8) to display the current userid, and date (also Chapter 8) to display the time and date. To make it look nice, you want all the output to be on a single line:

(whoami; date) | xargs

You have a file, filenames, that contains the names of a number of files. You want to use cat (Chapter 16) to combine all the data in these files and save the output to a file named master:

xargs cat < filenames > master

Finally, you want to move all the C source files in your current directory to a subdirectory named archive. First, create the subdirectory if it does not already exist:

mkdir archive

Now use ls and xargs to list and move all the files that have an extension of .c. I have included the -t option, so xargs will show you each command as it is executed:

ls *.c | xargs -t -i mv {} archive

Jump to top of page



Exercises

Review Question #1:

Which command do you use to create a brand new, empty file? Why do you rarely need to use this command?

Give three common situations in which a file would be created for you automatically.

Review Question #2:

Examine each of the following character strings and decide whether or not it would make a good filename. If not, explain why.

data-backup-02
data_backup_02
DataBackup02
DATABACKUP02
Data Backup 02
Data;Backup,02
databackup20
data/backup/20

Review Question #3:

What are file permissions?

What are the two main uses for file permissions?

Which program is used to set or change file permissions?

What are the three types of file permissions?

For each type, explain what it means when applied (a) to an ordinary file (b) to a directory.

Review Question #4:

What is a link?

What is a symbolic link?

What is a hard link and a soft link?

How do you create a link?

How do you create a symbolic link?

Review Question #5:

Which three programs are used to find a file or a set of files?

When would you use each one?

Applying Your Knowledge #1:

Within your home directory, create a directory named temp and change to it. Within this directory, create two subdirectories named days and months. Within each directory, create two files named file1 and file2.

Hint: Use a subshell (see Chapter 15) to change the working directory and create the files.

Once all the files are created, use the tree program (Chapter 24) to display a diagram of the directory tree showing both directories and files. If your system doesn't have tree, use ls -R instead.

Applying Your Knowledge #2:

Continuing from the previous exercise:

Within the days directory, rename file1 to monday and rename file2 to tuesday. Then copy monday to friday.

Within the months directory, rename file1 to december and file2to july.

Move back to temp and use tree to display another diagram of the directory tree. If your system doesn't have tree, use ls -R instead.

Create a link from december to april. Create a symbolic link from december to may. Display a long listing of the months directory. What do you notice?

To clean up, use a single command to delete the temp directory, all its subdirectories, and all the files in those directories. Use one final command to confirm that temp no longer exists.

Applying Your Knowledge #3:

Use a text editor to create a file named green. Within the file, enter the line:

I am a smart person.

Save your work and quit the editor.

Create a link to the file green and call it blue. Display a long listing of green and blue and make a note of their modification times.

Wait 3 minutes, then use the text editor to edit the file green. Change the one line of text to:

I am a very smart person.

Save your work and quit the editor.

Display a long listing of green and blue. Why do they both have the same (updated) modification time even though you only edited one file? What would have happened if you had changed blue instead of green?

Applying Your Knowledge #4:

You are setting up a Web site in which all the HTML files are in subdirectories of a directory named httpdocs in your home directory. Use a pipeline to find all the files with the extension .html and change their permissions to the following:

Owner: read + write
Group: nothing
Other: read

For Further Thought #1:

Most GUI-based file managers maintain a "trash" folder to store deleted files, so they can be recovered if necessary. With such systems, a file is not gone for good until it is deleted from the "trash". Why do the Unix text-based tools not offer such a service?

For Further Thought #2:

Imagine going back in a time machine to 1976, when I first started to use Unix. You find me and ask for a tour of the Unix system I am using.

To your surprise, you see files, directories, subdirectories, a working directory, a home directory, special files, links, inodes, permissions, and so on. You also see all the standard commands: ls, mkdir, pwd, cd, chmod, cp, rm, mv, ln and find. Indeed, you notice that virtually every idea and tool you learned about in this chapter was developed more than thirty years ago.

What is it about the basic design of the Unix filesystem that has enabled it to survive for so long and still be so useful?

Is this unusual in the world of computing?

For Further Thought #3:

The find program is a powerful tool, but very complicated. Imagine a GUI-based version of the program that enables you to choose options from a large drop-down list, enter file patterns into a form, select various tests from a menu, and so on.

Would such a program be be easier to use than the standard text-based version?

Could a GUI version of find replace the text-based version?

Jump to top of page