Harley Hahn's Guide to Unix and Linux

Chapters...
   1   2   3
   4   5   6
   7   8   9
  10 11 12
  13 14 15
  16 17 18
  19 20 21
  22 23 24
  25 26

Glossary

Appendixes...
  A B C
  D E F
  G H

Command
Summary...

• Alphabetical
• By category

Unix-Linux
Timeline

Internet
Resources

Errors and
Corrections

Endorsements

INSTRUCTOR
AND STUDENT
MATERIAL...

Home Page
& Overview

Exercises
& Answers

The Unix Model
Curriculum &
Course Outlines

PowerPoint Files
for Teachers

Chapter 15...

Standard I/O, Redirection and Pipes

The Unix Philosophy
The New Unix Philosophy
Standard Input, Standard Output and Standard Error
Redirecting Standard Output
Preventing Files From Being Replaced or Created by Redirection
Redirecting Standard Input
File Descriptors; Redirecting Standard Error With the Bourne Shell Family
Subshells
Redirecting Standard Error With the C-Shell Family
Combining Standard Output and Standard Error
Throwing Away Output
Redirection: Summaries and Experimenting
Pipelines
Splitting a Pipeline: tee
The Importance of Pipelines
Conditional Execution
Exercises

From the beginning, the Unix command line has always had a certain something that makes it different from other operating systems. That "something" is what we might call the Unix toolbox: the large variety of programs that are a part of every Unix and Linux system, and the simple, elegant ways in which you can use them.

In this chapter, I will explain the philosophy behind the Unix toolbox. I will then show you how to combine basic building blocks into powerful tools of your own. In Chapter 16, we will survey the most important of these programs, to introduce you to the resources available for your day-to-day work. By the time you finish these two chapters, you will be on your way to developing the most interesting and enjoyable computer skills you will ever use.

Jump to top of page

The Unix Philosophy

In the 1960s, the Bell Labs researchers who would become the first Unix developers were working on an operating system called Multics (see Chapter 1). One of the problems with Multics was that it was too unwieldy. The Multics design team had tried to make their system do too many things in order to please too many people. When Unix was designed — at first, in 1969, by only two people — the developers felt strongly that it was important to avoid the complexity of Multics and other such operating systems.

Thus, they developed a Spartan attitude in which economy of expression was paramount. Each program, they reasoned, should be a single tool with, perhaps, a few basic options. A program should do only one thing, but should do it well. If you needed to perform a complex task, you should — whenever possible — do so by combining existing tools, not by writing new programs.

For example, virtually all Unix programs generate some type of output. When a program displays a large amount of output, the data may come so fast that most of it will scroll off the screen before you can read it. One solution is to require that every program be able to display output one screenful at a time when necessary. This is just the type of solution that the original Unix developers wanted to avoid. Why should all programs need to incorporate the same functionality? Couldn't there be a simpler way to ensure that output is presented to users in a way that is easy for them to read?

For that matter, why should every program you run have to know where its output was going? Sometimes you want output to be displayed on the screen; other times you want to save it in a file. There may even be times when you want to send output to another program for more processing.

For these reasons, the Unix designers built a single tool whose job was to display data, one screenful at a time. This tool was called more, because after displaying a screenful of data, the program displayed the prompt --More-- to let the user know there was more data.

The tool was simple to use. A user would read one screenful of data and then press <Space> to display the next screen. When the user was finished, he would type q to quit.

Once more was available, programmers could stop worrying about how the output of their programs would be displayed. If you were a programmer, you knew that whenever a user running your program found himself with a lot of output, he would simply send it to more. (You'll learn how to do this later in the chapter.) If you were a user, you knew that, no matter how many programs you might ever use, you only needed to learn how to use one output display tool.

This approach has three important advantages, even today. First, when you design a Unix tool, you can keep it simple. For example, you do not have to endow every new program with the ability to display data one screenful at a time: there is already a tool to do that. Similarly, there are also tools to sort output, remove certain columns, delete lines that do not contain specific patterns, and on and on (see Chapter 16). Since these tools are available to all Unix users, you don't have to include such functionality in your own programs.

This leads us to the second advantage. Because each tool need only do one thing, you can, as a programmer, focus your effort. When you are designing, say, a program to search for specific patterns in a data file, you can make it the best possible pattern searching program; when you are designing a sorting program, you can make it the best possible sorting program; and so on.

The third advantage is ease of use. As a user, once you learn the commands to control the standard screen display tool, you know how to control the output for any program.

Thus, in two sentences, I can summarize the Unix philosophy as follows:

• Each program or command should be a tool that does only one thing and does it well.

• When you need a new tool, it is better to combine existing tools than to write new ones.

We sometimes describe this philosophy as:

• "Small is beautiful" or "Less is more".

Jump to top of page

The New Unix Philosophy

Since Unix is well into its fourth decade, it makes sense to ask if the Unix philosophy has, in the long run, proven to be successful. The answer is, yes and no.

To a large extent, the Unix philosophy is still intact. As you will see in Chapter 16, there are a great many single-purpose tools, which are easy to combine as the need arises.

Moreover, because the original Unix developers designed the system so well, programs that are over 30 years old can, today, work seamlessly with programs that are brand new. Compare this to the world of Windows or the Macintosh.

However, the original philosophy has proved inadequate in three important ways. First, too many people could not resist creating alternative versions of the basic tools. This means that you must sometimes learn how to use more than one tool to do the same job.

For example, over the years, there have been three screen display programs in common use: more, pg and less. These days, most people use less, which is the most powerful (and most complex) of the three programs. However, more is simpler to use, and you will find it on all systems, so you really should know how to use it. My guess is that, one day, you will log in to a system that uses more to display output and, if you only know less, you will be confused. On the other hand, just understanding more is not good enough because, on many systems, less is the default (and less is a better program). The bottom line: you need to learn how to use at least two different screen display programs.

The second way in which the Unix philosophy was inadequate had to do with the growing needs of users. The idea that small is beautiful has a lot of appeal, but as users grew more sophisticated and their needs grew more demanding, it became clear that simple tools were often not enough.

For instance, the original Unix text editor was ed. (The name, which stands for "editor", is pronounced as two separate letters, "ee-dee"). ed was designed to be used with terminals that printed output on paper. The ed program had relatively few commands; it was simple to use and could be learned quickly. If you had used Unix in the early days, you would have found ed to be an unadorned, unpretentious tool: it did one thing (text editing) and it did it well(*).

* Footnote

The ed editor is still available on all Unix and Linux systems. Give it a try when you get a moment. Start by reading the man page (man ed).

As editors go, ed was, at best, mildly sophisticated. Within a few years, however, terminals were improved and the needs of Unix users became more demanding. To respond to those needs, programmers developed new editors. In fact, over the years, literally tens of different text editors were developed.

For mainstream users, ed was replaced by a program called ex. (The name, which stands for "extended editor" is pronounced as two separate letters, "ee-ex".) Then, ex itself was extended to create vi ("visual editor", pronounced "vee-eye"). As an alternative to the ed/ex/vi family, an entirely different editing system called Emacs was developed.

Today, vi and Emacs are the most popular Unix text editors, but no one would ever accuse them of being simple and unadorned. Indeed, vi (Chapter 22) and Emacs are extremely complex.

The third way in which the original Unix philosophy has proven inadequate has to do with a basic limitation of the CLI (command line interface). As you know, the CLI is text-based. This means it cannot handle graphics and pictures, or files that do not contain plain text, such as spreadsheets or word processor documents.

Most command-line programs read and write text, which is why such programs are able to work together: they all use the same type of data. However, this means that when you want to use other types of data — non-textual data — you must use other types of programs. This is why, as we discussed in Chapters 5 and 6, you must learn how to use both the CLI and GUI environments.

For these reasons, you must approach the learning of Unix carefully. In 1979, when Unix was only a decade old, the original design was still intact, and you could learn most everything about all the common commands. Today, there is so much more Unix to learn, you can't possibly know it all, or even most of it. This means you must be selective about which programs and tools you want to learn. Moreover, as you teach yourself how to use a tool, you must be selective about which options and commands you want to master.

As you read the next two chapters, there is something important I want you to remember. By all means, you should work in front of your computer as you read, and enter new commands as you encounter them. This is how you learn to use Unix. However, I want you to do more than just memorize details. As you read and as you experiment, I want you to develop a sense of perspective. Every now and then, take a moment to pull back and ask yourself, "Where does the tool I am learning right now fit into the big picture?"

My goal for you is that, in time, you will come to appreciate what we might call the new Unix philosophy:

• "Small is beautiful, except when it isn't."

— hint —

Whenever you learn how to use a new program, do not feel as if you must memorize every detail. Rather, just answer three questions:

What can this program do for me?

What are the basic details? That is, what works for most people most of the time?

Where can I look for more help when I need it?

Jump to top of page

Standard Input, Standard Output
and Standard Error

If there is one single idea that is central to using Unix effectively, it is the concept of standard input and output. Understand this one idea, and you are one giant step closer to becoming a Unix master.

The basic idea is simple: Every text-based program should be able to accept input from any source and write output to any target.

or instance, say you have a program that sorts lines of text. You should have your choice of typing the text at the keyboard, reading it from an existing file, or even using the output of another program. Similarly, the sort program should be able to display its output on the screen, write it to a file, or send it to another program for more processing.

Using such a system has two wonderful advantages. First, as a user, you have enormous flexibility. When you run a program, you can define the input and output (I/O) as you see fit, which means you only have to learn one program for each task. For example, the same program that sorts a small amount of data and displays it on the screen, can also sort a large amount of data and save it to a file.

The second advantage to doing I/O in this way is that it makes creating new tools a lot easier. When you write a program, you can depend on Unix to handle the input and output for you, which means you don't need to worry about all the variations. Instead, you can concentrate on the design and programming of your tool.

The crucial idea here is that the source of input and the target of output are not specified by the programmer. Rather, he or she writes the program to read and write in a general way. Later, at the time the program runs, the shell connects the program to whatever input and output the user wants to use(*).

* Footnote

Historically, the idea of using abstract I/O devices was developed to allow programmers to write programs that were independent of specific hardware. Can you see how, philosophically, this idea is related to the layers of abstraction we discussed in Chapter 5, and to the terminal description databases (Termcap and Terminfo) we discussed in Chapter 7?

To implement this idea, the developers of Unix designed a general way to read data called STANDARD INPUT and two general ways to write data called STANDARD OUTPUT and STANDARD ERROR. The reason there are two different output targets is that standard output is used for regular output, and standard error is used for error messages. Collectively, we refer to these facilities as STANDARD I/O (pronounced "standard eye-oh").

In practice, we often speak of these three terms informally as if they were actual objects. Thus, we might say, "To save the output of a program, write standard output to a file." What we really mean is, "To save the output of a program, tell the shell to set the output target to be a file."

Understanding the concepts of standard input, standard output, and standard error are crucial to using Unix well. Moreover, these same concepts are also used to control I/O with other programming languages, such as C and C++.

— hint —

You will often see standard input, standard output, and standard error abbreviated as STDIN, STDOUT and STDERR. When we use these abbreviations in conversation, we pronounce them as "standard in", "standard out", and "standard error".

For example, if you were creating some documentation, you might write, "The sort program reads from stdin, and writes to stdout and stderr." If you were reading this sentence to an audience, you would pronounce the abbreviations as follows: "The sort program reads from standard in, and writes to standard out and standard error."

Jump to top of page

Redirecting Standard Output

When you log in, the shell automatically sets standard input to the keyboard, and standard output and standard error to the screen. This means that, by default, most programs will read from the keyboard and write to the screen.

However — and here's where the power of Unix comes in — every time you enter a command, you can tell the shell to reset standard input, standard output or standard error for the duration of the command.

In effect, you can tell the shell: "I want to run the sort command and save the output to a file called names. So for this command only, I want you to write standard output to that file. After the command is over, I want you to reset standard output back to my screen."

Here is how it works: If you want the output of a command to go to the screen, you don't have to do anything. This is automatic.

If you want the output of a command to go to a file, type > (greater-than) followed by the name of the file, at the end of the command. For example:

sort > names

This command will write its output to a file called names. The use of a > character is apt, because it looks like an arrow showing the path of the output.

When you write output to a file in this way, the file may or may not already exist. If the file does not exist, the shell will create it for you automatically. In our example, the shell will create a file called names.

If the file already exists, its contents will be replaced, so you must be careful. For instance, if the file names already exists, the original contents will be lost permanently.

In some cases, this is fine, because you do want to replace the contents of the file, perhaps with newer information. In other cases, you may not want to lose what is in the file. Rather, you want to add new data to what is already there. To do so use >> , two greater-than characters in a row. This tells the shell to append any new data to the end of an existing file. Thus, consider the command:

sort >> names

If the file names does not exist, the shell will create it. If it does exist, the new data will be appended to the end of the file. Nothing will be lost.

When we send standard output to a file, we say that we REDIRECT it. Thus, in the previous two examples, we redirect standard output to a file called names.

Now you can see why there are two types of output: standard output and standard error. If you redirect the standard output to a file, you won't miss the error messages, as they will still be displayed on your monitor.

When you redirect output, it is up to you to be careful, so you do not lose valuable data. There are two ways to do so. First, every time you redirect output to a file, think carefully: Do you want to replace the current contents of the file? If so, use >. Or, would you rather append new data to the end of the file. If that is the case, use >>.

Second, as a safeguard, you can tell the shell to never replace the contents of an existing file. You do this by setting the noclobber option (Bash, Korn shell) or the noclobber shell variable (C-Shell, Tcsh). We'll discuss this in the next section.

Jump to top of page

Preventing Files From Being Replaced
or Created by Redirection

In the previous section, I explained that when you redirect standard output to a file, any data that already exists in the file will be lost. I also explained that when you use >> to append output to a file, the file will be created if it does not already exist.

There may be times when you do not want the shell to make such assumptions on your behalf. For example, say you have a file called names that contains 5,000 lines of data. You want to append the output of a sort command to the end of this file. In other words, you want to enter the command:

sort >> names

However, you make a mistake and accidentally enter:

sort > names

What happens? All of your original data is wiped out. Moreover, it is wiped out quickly. Even if you notice the error the moment you press <Return>, and even if you instantly press ^C to abort the program (by sending the intr signal; see Chapter 7), it is too late. The data in the file is gone forever.

This is why. As soon as you press <Return>, the shell gets everything ready for the sort program by deleting the contents of your target file. Since the shell is a lot faster than you, by the time you abort the program the target file is already empty.

To prevent such catastrophes, you can tell the shell not to replace an existing file when you use > to redirect output. In addition, with the C-Shell family, you can also tell the shell not to create a new file when you use >> to append data. This ensures that no files are replaced or created by accident.

To have the shell take such precautions on your behalf, you use what we might call the noclobber facility. With the Bourne shell family (Bash, Korn shell), you set the noclobber shell option:

set -o noclobber

To unset this option, use:

set +o noclobber

With the C-Shell family (C-Shell, Tcsh), you set the noclobber shell variable:

set noclobber

To unset this variable, use:

unset noclobber

(See Chapter 12 for a discussion of options and variables; see Appendix G for a summary.)

Once noclobber is set, you have built-in protection. For example, let's say you already have a file called names and you enter:

sort > names

You will see an error message telling you that the file names already exists. Here is such a message from Bash:

bash: names: cannot overwrite existing file

Here is the equivalent message from the Tcsh:

names: File exists.

In both cases, the shell has refused to carry out the command, and your file is safe.

What if you really want to replace the file? In such cases, it is possible to override noclobber temporarily. With a Bourne shell, you use >| instead of >:

sort >| names

With a C-Shell, you use >! instead of >:

sort >! names

Using >| or >! instead of > tells the shell to redirect standard output even if the file exists.

As we discussed earlier, you can append data to a file by redirecting standard output with >> instead of >. In both cases, if the output file does not exist, the shell will create it. However, if you are appending data, it would seem likely that you expect the file to already exist. Thus, if you use >> and the file does not exist, you are probably making a mistake. Can noclobber help you here?

Not with a Bourne shell. If you append data with Bash or the Korn shell and the noclobber option is set, it's business as usual. The C-Shell and Tcsh know better. They will tell you that the file does not exist, and refuse to carry out the command.

For example, say you are a C-Shell or Tcsh user; the noclobber shell variable is set; and you have a file named addresses, to which you want to append data. You enter the command:

sort >> address

You will see an error message:

address: No such file or directory

At which point you will probably say, "Oh, I should have typed addresses, not address. Thank you, Mr. C-Shell."

Of course, there may be occasions when you are appending data to a file, and you really do want to override noclobber. For example, you are a C-Shell user and, for safety, you have set noclobber. You want to sort a file named input and append the data to a file named output.

If output doesn't exist, you want to create it. The important thing is, if output does exist, you don't want to lose what is already in it, which is why you are appending (>>), not replacing (> ). If noclobber wasn't set, you would use:

sort >> output

Since noclobber is set, you must override it. To do so, just use >>! instead of >>:

sort >>! output

This will override the automatic check for this one command only.

Jump to top of page

Redirecting Standard Input

By default, standard input is set to the keyboard. This means that, when you run a program that needs to read data, the program expects you to enter the data by typing it, one line at a time. When you are finished entering data, you press ^D (<Ctrl-D>) to send the eof signal (see Chapter 7). Pressing ^D indicates that there is no more data.

Here is an example you can try for yourself. Enter:

sort

The sort program is now waiting for you to enter data from standard input (the keyboard). Type as many lines as you want. For example, you might enter:

Harley
Casey
Weedly
Linda
Melissa

After you have pressed <Return> on the last line, press ^D to send the eof signal. The sort program will now sort all the data alphabetically and write it to standard output. By default this is the screen, so you will see:

Casey
Harley
Linda
Melissa
Weedly

There will be many times, however, when you want to redirect standard input to read data from a file, rather than from the keyboard. Simply type < (less-than), followed by the name of the file, at the end of the command.

For example, to sort the data contained in a file called names, use the command:

sort < names

As you can see, the < character is a good choice as it looks like an arrow showing the path of the input.

Here is an example you can try for yourself. As I mentioned in Chapter 11, the basic information about each userid is contained in the file /etc/passwd. You can display a sorted version of this file by entering the command:

sort < /etc/passwd

As you might imagine, it is possible to redirect both standard input and standard output at the same time, and this is done frequently. Consider the following example:

sort < rawdata > report

This command reads data from a file named rawdata, sorts it, and writes the output to a file called report.

Jump to top of page

File Descriptors; Redirecting Standard Error
With the Bourne shell Family

Although the following discussion is oriented towards the Bourne shell family, we will be talking about important ideas regarding Unix I/O. For that reason, I'd like you to read this entire section, regardless of which shell you happen to be using right now.

As I explained earlier, the shell provides two different output targets: standard output and standard error. Standard output is used for regular output; standard error is used for error messages. By default, both types of output are displayed on the screen. However, you can separate the two output streams should the need arise.

If you choose to separate the output streams, you have a lot of flexibility. For example, you can redirect standard output to a file, where it will be saved. At the same time, you can leave standard error alone, so you won't miss any error messages (which will be displayed on the screen). Alternatively, you can redirect standard output to one file and standard error to another file. Or you can redirect both types of output to the same file.

Alternatively, you can send standard output or standard error (or both) to another program for further processing. I'll show you how to do that later in the chapter when we discuss pipelines.

The syntax for redirecting standard error is different for the two shell families. We'll talk about the Bourne shell family first, and then move on to the C-Shell family. To prepare you, however, I need to take a moment to explain one aspect of how Unix handles I/O.

Within a Unix process, every input source and every output target is identified by a unique number called a FILE DESCRIPTOR. For example, a process might read data from file #8 and write data to file #6. When you write programs, you use file descriptors to control the I/O, one for each file you want to use.

Within the Bourne shell family, the official syntax for redirecting input or output is to use the number of a file descriptor followed by < (less-than) or > (greater-than). For example, let's say a program named calculate is designed to write output to a file with file descriptor 8. You could run the program and redirect its output to a file named results by using the command:

calculate 8> results

By default, Unix provides every process with three pre-defined file descriptors, and most of the time that is all you will need. The default file descriptors are 0 for standard input, 1 for standard output, and 2 for standard error.

Thus, within the Bourne shell family, the syntax for redirecting standard input is to use 0< followed by the name of the input file. For example:

command 0< inputfile

where command is the name of a command, and inputfile is the name of a file.

The syntax for redirecting standard output and standard error are similar. For standard output:

command 1> outputfile

For standard error:

command 2> errorfile

where command is the name of a command, and outputfile and errorfile are the names of files.

As a convenience, if you leave out the 0 when you redirect input, the shell assumes you are referring to standard input. Thus, the following two commands are equivalent:

sort 0< rawdata
sort < rawdata

Similarly, if you leave out the 1 when you redirect output, the shell assumes you are referring to standard output. Thus, the following two commands are also equivalent:

sort 1> results
sort > results

Of course, you can use more than one redirection in the same command. In the following examples, the sort command reads its input from a file named rawdata, writes its output to a file named results, and writes any error messages to a file named errors:

sort 0< rawdata 1> results 2> errors
sort < rawdata > results 2> errors

Notice that you can leave out the file descriptor only for standard input and standard output. With standard error, you must include the 2. This is shown in the following simple example, in which standard error is redirected to a file named errors:

sort 2> errors

When you redirect standard error, it doesn't affect standard input or standard output. In this case, standard input still comes from the keyboard, and standard output still goes to the monitor.

As with all redirection, when you write standard error to a file that already exists, the new data will replace the existing contents of the file. In our last example, the contents of the file errors would be lost.

If you want to append new output to the end of a file, just use 2>> instead of 2>. For example:

sort 2>> errors

Redirecting standard error with the C-Shell family is a bit more complicated. Before we get to it, I need to take a moment to discuss an important facility called subshells. Even if you don't use the C-Shell or Tcsh, I want you to read the next section, as subshells are important for everyone.

Jump to top of page

Subshells

To understand the concept of a subshell, you need to know a bit about Unix processes. In Chapter 26, we will discuss the topic in great detail. For now, here is a quick summary.

A PROCESS is a program that is loaded into memory and ready to run, along with the program's data and the information needed to keep track of that program. When a process needs to start another process, it creates a duplicate process. The original is called the PARENT; the duplicate is called the CHILD.

The child starts running and the parent waits for the child to die (that is, to finish). Once the child dies, the parent then wakes up, regains control and starts running again, at which time the child vanishes.

To relate this to your minute-to-minute work, think about what happens when you enter a command. The shell parses the command and figures out whether it is an internal command (one that is built-in to the shell) or an external command (a separate program). When you enter a builtin command, the shell interprets it directly within its own process. There is no need to create a new process.

When you enter an external command, the shell finds the appropriate program and runs it as a new process. When the program terminates, the shell regains control and waits for you to enter another command. In this case, the shell is the parent, and the program it runs on your behalf is the child.

Consider what happens when you start a brand new shell for yourself. For instance, if you are using Bash and you enter the bash command (or if you are using the C-Shell, and you enter the csh command, and so on).

The original shell (the parent) starts a new shell (the child). Whenever a shell starts another shell, we call the second shell a SUBSHELL. Thus, we can say that, whenever you start a new shell (by entering bash or ksh or csh or tcsh), you cause a subshell to be created. Whatever commands you now enter will be interpreted by the subshell. To end the subshell, you press ^D to send the eof signal (see chapter 7). At this point, the parent shell regains control. Now, whatever commands you enter are interpreted by the original shell.

When a subshell is created, it inherits the environment of the parent (see Chapter 12). However, any changes the subshell makes to the environment are not passed back to the parent. Thus, if a subshell modifies or creates environment variables, the changes do not affect the original shell.

This means that, within a subshell, you can do whatever you want without affecting the parent shell. This capability is so handy, that Unix gives you two ways to use subshells.

First, as I mentioned above, you can enter a command to start a brand new shell explicitly. For example, if you are using Bash, you would enter bash. You can now do whatever you want without affecting the original shell. For instance, if you were to change an environment variable or a shell option, the change would disappear as soon as you entered ^D; that is, the moment that the new shell dies and the original shell regains control.

There will be times when you want to run a small group of commands, or even a single command, in a subshell without having to deal with a whole new shell. Unix has a special facility for such cases: just enclose the commands in parentheses. That tells the shell to run the commands in a subshell.

For example, to run the date command in subshell, you would use:

(date)

Of course, there is no reason to run date in a subshell. Here, however, is a more realistic example using directories.

In Chapter 24, we will discuss directories, which are used to contain files. You can create as many directories as you want and, as you work, you can move from one directory to another. At any time, the directory in which you are currently working is called your working directory.

Let's say you have two directories named documents and spreadsheets, and you are currently working in the documents directory. You want to change to the spreadsheets directory and run a program named calculate. Before you can run the program, you need to set the environment variable DATA to the name of a file that contains certain raw data. In this case, the file is named statistics. Once the program has run, you need to restore DATA to its previous value, and change back to the documents directory. (In other words, you need to reset the environment to its previous state.)

One way to do this is start a new shell, then change your working directory, change the value of DATA, and run the calculate program. Once this is all done, you can exit the shell by pressing ^D. When the new shell ends and the old shell regains control, your working directory and the variable DATA will be in their original state.

Here is what it looks like, assuming you use Bash for your shell. (The cd command, which we will meet in Chapter 24, changes your working directory. Don't worry about the syntax for now.)

bash
cd ../spreadsheets
export DATA=statistics
calculate
^D

Here is an easier way, using parentheses:

(cd ../spreadsheets; export DATA=statistics; calculate)

When you use a subshell in this way, you don't have to worry about starting or stopping a new shell. It is done for you automatically. Moreover, within the subshell, you can do anything you want to the environment without having permanent effects. For example, you can change your working directory, create or modify environment variables, create or modify shell variables, change shell options, and so on.

You will sometimes see the commands within the parentheses called a GROUPING, especially when you are reading documentation for the C-Shell family. In our example, for instance, we used a grouping of three commands. The most common reason to use a grouping and a subshell is to prevent the cd (change directory) command from affecting the current shell. The general format is:

(cd directory; command)

Jump to top of page

Redirecting Standard Error
With the C-Shell Family

Within the Bourne shell family, redirecting standard error is straightforward. You use 2> followed by the name of a file. With the C-Shell family (C-Shell, Tcsh), redirecting standard error is not as simple, because of an interesting limitation, which I'll get to in a moment.

With the C-Shell family, the basic syntax for redirecting standard error is:

command >& outputfile

where command is the name of a command, and outputfile is the name of a file.

For example, if you are using the C-Shell or Tcsh, the following command redirects standard error to a file named output:

sort >& output

If you want to append the output to the end of an existing file, use >>& instead of >&. In the following example, the output is appended to a file named output:

sort >>& output

If you have set the noclobber shell variable (explained earlier in the chapter) and you want to override it temporarily, use >&! instead of >&. For example:

sort >&! output

In this example, the contents of the file will be replaced, even if noclobber is set.

So what is the limitation I mentioned? When you use >& or >&!, the shell redirects both standard output and standard error. In fact, within the C-Shell family, there is no simple way to redirect standard error all by itself. Thus, in the last example, both the standard output and standard error are redirected to a file named output.

It happens that there is a way to redirect standard error separately from standard output. However, in order to do it, you need to know how to use subshells (explained in the previous section). The syntax is:

(command > outputfile) >& errorfile

where command is the name of a command, and outputfile and errorfile are the names of files.

For example, say you want to use sort with standard output redirected to a file named output, and standard error redirected to a file named errors. You would use:

(sort > output) >& errors

In this case, sort runs in a subshell and, within that subshell, standard output is redirected. Outside the subshell, what is left of the output — standard error — is redirected to a different file. The net effect is to redirect each type of output to its own file.

Of course, if you want, you can append the output by using >> and >>&. For example, to append standard output to a file named output, and append standard error to a file named errors, use a command like the following:

(sort >> output) >>& errors

Jump to top of page

Combining Standard Output
and Standard Error

All shells allow you to redirect standard output and standard error. But what if you want to redirect both standard output and standard error to the same place?

With the C-Shell family, this is easy, because when you use >& (replace) or >>& (append), the shell automatically combines both output streams. For example, in the following C-Shell commands, both standard output and standard error are redirected to a file named output:

sort >& output
sort >>& output

With the Bourne shell family, the scenario is more complicated. We'll talk about the details, and then I'll show you a shortcut that you can use with Bash.

The basic idea is to redirect one type of output to a file, and then redirect the other type of output to the same place. The syntax to do so is:

command x> outputfile y>&x

where command is the name of a command, x and y are file descriptors, and outputfile is the name of a file.

For example, in the following sort command, standard output (file descriptor 1) is redirected to a file named output. Then standard error (file descriptor 2) is redirected to the same place as file descriptor 1. The overall effect is to send both regular output and error messages to the same file:

sort 1> output 2>&1

Since, file descriptor 1 is the default for redirected output, you can leave out the first instance of the number 1:

sort > output 2>&1

Before we move on, I'd like to talk about an interesting mistake that is easy to make. What happens if you reverse the order of the redirections?

sort 2>&1 > output

Although this looks almost the same as the example above, it won't work. Here is why:

The instruction 2>&1 tells the shell to send the output of file descriptor 2 (standard error) to the same place as the output of file descriptor 1 (standard output). However, in this case, the instruction is given to the shell before standard output is redirected. Thus, when the shell processes 2>&1, standard output is still being sent to the monitor (by default). This means that standard error ends up being redirected to the monitor, which is where it was going in the first place.

The net result is that standard error goes to the monitor, while standard output goes to a file. (Take a moment to think about this, until it makes sense.)

To continue, what if you want to redirect both standard output and standard error, but you want to append the output to a file? Just use >> instead of >:

sort >> output 2>&1

In this case, using >> causes both standard output and standard error to be appended to the file named output.

You might ask, is it possible to combine both types of output by starting with standard error? That is, can you redirect standard error to a file and then send standard output to the same place? The answer is yes:

sort 2> output 1>&2
sort 2>> output 1>&2

The commands are different from the earlier examples, but they have the same effect.

As you can see, the Bourne shell family makes combining two output streams complicated. Can it not be made simpler? Why not just send both standard output and standard error to the same file directly? For example:

sort > output 2> output

Although this looks as if it might work, it won't, because, if you redirect to the same file twice in one command, one of the redirections will obliterate the other one.

And now the shortcut. You can use the above technique with all members of the Bourne shell family, in particular, with Bash and the Korn shell. With Bash, however, you can also use either &> or >& (choose the one you like best) to redirect both standard output and standard error at the same time:

sort &> output
sort >& output

This allows you to avoid having to remember the more complicated pattern. However, if you want to redirect both standard output and standard error and append the output, you will need to use the pattern we discussed above:

sort >> output 2>&1

By now, if you are normal, you are probably getting a bit confused. Don't worry. Everything we have been discussing in the last few sections is summarized in Figures 15-1 and 15-2 (later in the chapter). My experience is that, with a bit of practice, you'll find the rules for redirection easy to remember.

Jump to top of page

Throwing Away Output

Why would you want to throw away output?

Occasionally, you will run a program because it performs a specific action, but you don't really care about the output. Other times, you might want to see the regular output, but you don't care about error messages. In the first case, you would throw away standard output; in the second case, you would throw away standard error.

To do so, all you have to do is redirect the output and send it to a special file named /dev/null. (The name is pronounced "slash-dev-slash-null", although you will sometimes hear "dev-null".) The name /dev/null will make sense after you read about the Unix file system in Chapter 23. The important thing about /dev/null is that anything you send to it disappears forever(*). When Unix people gather, you will sometimes hear /dev/null referred to, whimsically, as the BIT BUCKET.

* Footnote

Said a widower during a lull,
"My late wife was exceedingly dull.
If I'd killed her, they'd trail me
And catch me and jail me,
So I sent her to /dev/null."

For example, let's say you have a program named update that reads and modifies a large number of data files. As it does its work, update displays statistics about what is happening. If you don't want to see the statistics, just redirect standard output to /dev/null:

update > /dev/null

Similarly, if you want to see the regular output, but not any error messages, you can redirect standard error. With the Bourne shell family (Bash, Korn shell), you would use:

update 2> /dev/null

With the C-Shell family (C-Shell, Tcsh) you would use:

update >& /dev/null

As I explained earlier, the above C-Shell command redirects both standard output and standard error, effectively throwing away all the output. You can do the same with the Bourne shell family as follows:

update > /dev/null 2>&1

So what do you do if you are using a C-Shell and you want to throw away the standard error, but not the standard output? You can use a technique we discussed earlier when we talked about how to redirect standard error and standard output to different files. In that case, we ran the command in a subshell as follows:

(update > output) >& errors

Doing so allowed us to separate the two output streams. Using the same construction, we can throw away standard error by redirecting it to the /dev/null. At the same time, we can preserve the standard output by redirecting it to /dev/tty:

(update > /dev/tty) >& /dev/null

The special file /dev/tty represents the terminal. We'll discuss the details in Chapter 23. For now, all you need to know is that, when you send output to /dev/tty, it goes to the monitor. In this way, we can make the C-Shell and Tcsh send standard output to the monitor while throwing away standard error(*).

* Footnote

If you are thinking, "We shouldn't have to go to such trouble to do something so simple," you are right. This is certainly a failing of the C-Shell family. Still, it's cool that we can do it.

Jump to top of page

Redirection:
Summaries and Experimenting

Redirecting standard input, standard output, and standard error is straightforward. The variations, however, can be confusing. Still, my goal is that you should become familiar with all the variations — for both shell families — which will take a bit of practice. To make it easier, I can help you in two ways.

First, for reference, Figures 15-1 and 15-2 contain summaries of all the redirection metacharacters. Figure 15-1 is for the Bourne shell family; Figure 15-2 is for the C-Shell family. Within these summaries you will see all the features we have covered so far. You will also see a reference to piping. This refers to using the output of one program as the input to another program, which we discuss in the next section.

The second bit of help I have for you is in the form of an example you can use to experiment. In order to experiment with standard output and standard error, you will need a simple command that generates both regular output as well as an error message. The best such command I have found is a variation of ls.

The ls (list) command displays information about files, and we will meet it formally in Chapter 24. With the -l (long) option, ls displays information about files.

The idea is to use ls -l to display information about two files, a and b. File a will exist, but file b will not. Thus, we will see two types of output: standard output will display information about file a; standard error will display an error message saying that file b does not exist. You can then use this sample command to practice redirecting standard output and standard error.

Before we can start, we must create file a. To do that, we use the touch command. We'll talk about touch in Chapter 25. For now, all you need to know is that if you use touch with a file that does not exist, it will create an empty file with that name. Thus, if a file named a does not exist, you can create one by using:

touch a

We can now use ls to display information about both a (which exists) and b (which doesn't exist):

ls -l a b

Here is some typical output:

b: No such file or directory
-rw------- 1 harley staff 0 Jun 17 13:42 a

The first line is standard error. It consists of an error message telling us that file b does not exist. The second line is standard output. It contains the information about file a. (Notice that the file name is at the end of the line.) Don't worry about the details. We'll talk about them in Chapter 24.

We are now ready to use our sample command to experiment. Take a look at Figures 15-1 and 15-2, and choose something to practice. As an example, let's redirect standard output to a file named output:

ls -l a b > output

When you run this command, you will not see standard output, as it has been sent to the file output. However, you will see standard error:

b: No such file or directory

To check the contents of output, use the cat command. (We'll talk about cat in Chapter 16.)

cat output

In this case, cat will display the contents of output, the standard output from the previous command:

-rw------- 1 harley staff 0 Jun 17 13:42 a

Here is one more example. You are using Bash and you want to practice redirecting standard output and standard error to two different files:

ls -l a b > output 2> errors

Since all the output was redirected, you won't see anything on your screen. To check standard output, use:

cat output

To check standard error, use:

cat errors

As you are experimenting, you can delete a file by using the rm (remove) command. For example, to delete the files output and errors, use:

rm output errors

When you are finished experimenting, you can delete the file a by using:

rm a

Now that you have a good sample command (ls -l a b) and you know how to display the contents of a short file (cat filename), it's time to practice.

My suggestion is to create at least one example for each type of output redirection in Figures 15-1 and 15-2(*). Although it will take a while to work through the list, once you finish you will know more about redirection than 99⁴⁴/₁₀₀ percent of the Unix users in the world.

* Footnote

Yes, I want you to practice with at least one shell from each of the two shell families. If you are not sure which shells to choose, use Bash and the Tcsh.

If you normally use Bash, try the examples, then enter the tcsh command to start a Tcsh shell, then try the examples again. If you normally use the Tcsh, use that shell first, and then enter the bash command to start a Bash shell.

Regardless of which shell you happen to use right now, you never know what the future will bring. I want you to understand the basic shell concepts — environment variables, shell variables, options, and redirection — for any shell you may be called upon to use.

— hint —

To experiment with redirection, we used a variation of the ls command:

ls -l a b > output
ls -l a b > output 2> errors

To make your experiments easier, you can create an alias with a simple name for this command (see Chapter 13). With a Bourne shell (Bash, Korn shell), you might use:

alias x='ls -l a b'

With a C-Shell (C-Shell, Tcsh):

alias x 'ls -l a b'

Once you have such an alias, your test commands become a lot simpler:

x > output
x > output 2> errors
x >& output

This is a technique worth remembering.

Figure 15-1: Bourne shell family: Redirection of standard I/O

Most command-line programs use standard I/O for input and output. Input comes from standard input (stdin); regular output goes to standard output (stdout); error messages go to standard error (stderr).

With the Bourne shell family, you control standard I/O by using file descriptors (stdin= 0, stdout=1; stderr=2) with various metacharacters. In cases where there is no ambiguity, you can leave out the file descriptor. To prevent the accidental overwriting of an existing file, set the noclobber shell option. If noclobber is set, you can force overwriting by using >|. See text for details.

Metacharacters	Action
<	Redirect stdin (same as 0<)
>	Redirect stdout (same as 1>)
>\|	Redirect stdout; force overwrite
>>	Append stdout (same as 1>>)
2>	Redirect stderr
2>>	Append stderr
2>&1	Redirect stderr to stdout
&> or >&	Redirect stdout+stderr (Bash only)
\|	Pipe stdout to another command
2>&1 \|	Pipe stdout+stderr to another command

Figure 15-2: C-Shell family: Redirection of standard I/O

With the C-Shell family, you control standard I/O by using various metacharacters. To prevent the accidental overwriting of an existing file or the creation of a new file, set the noclobber shell variable. If noclobber is set, you can force overwriting or file creation by using a ! character. Note that, unlike the Bourne shell family (Figure 15-1), there is no simple way to redirect stderr without also redirecting stdout. See text for details.

Metacharacters	Action
<	Redirect stdin
>	Redirect stdout
>!	Redirect stdout; force overwrite
>&	Redirect stdout+stderr
>&!	Redirect stdout+stderr; force overwrite
>>	Append stdout
>>!	Append stdout; force file creation
>>&	Append stdout+stderr
>>&!	Append stdout+stderr; force file creation
\|	Pipe stdout to another command
\|&	Pipe stdout+stderr to another command

Jump to top of page

Pipelines

Earlier in the chapter, when we discussed the Unix philosophy, I explained that one goal of the early Unix developers was to build small tools, each of which would do one thing well. Their intention was that, when a user was faced with a problem that could not be solved by one tool, he or she would be able to put together a set of tools to do the job.

For example, let's say you work for the government and you have three large files that contain information about all the smart people in the country. Within each file, there is one line of information per person, including that person's name. Your problem is to find out how many such people are named Harley.

If you were to give this problem to an experienced Unix person, he would know exactly what to do. First, he would use the cat (catenate) command to combine the files. Then he would use the grep command to extract all the lines that contain the word Harley. Finally, he will use the wc (word count) command with the -l (line count) option, to count the number of lines.

Let's take a look at how we might put together such a solution based on what we have discussed so far. We will use redirection to store the intermediate results in temporary files, which we delete when the work is done. Skipping lightly over the details of how these commands work (we will discuss them later in the book), here are the commands to do the job. To help you understand what is happening, I have added a few comments:

cat file1 file2 file3 > tempfile1    # combine files
grep Harley < tempfile1 > tempfile2  # extract lines
wc -l < tempfile2                    # count lines
rm tempfile1 tempfile2               # delete temp files

Take a look at this carefully. Before we move on, make sure that you understand how, by redirecting standard output and standard input, we are able to pass data from one program to another by saving it in temporary files.

The sequence of commands we used above will work just fine. However, they have one drawback: the glue that holds everything together — redirection using temporary files — makes the solution difficult to understand. Moreover, the more complex you get, the easier it is to make a mistake.

In order to make such solutions simpler, the shell allows you to create a sequence of commands such that the standard output from one program is sent automatically to the standard input of the next program. When you do so, the connection between two programs is called a PIPE, and the sequence itself is called a PIPELINE.

To create a pipeline, you type the commands you want to use separated by the | (vertical bar) character (the pipe symbol). As an example, the previous set of four commands can be replaced by a single pipeline:

cat file1 file2 file3 | grep Harley | wc -l

To understand a pipeline, you read the command line from left to right. Each time you see a pipe symbol, you imagine the standard output of one program becoming the standard input of the next program.

The reason pipelines are so simple is that the shell takes care of all the details, so you don't have to use temporary files. In our example, the shell automatically connects the standard output of cat to the standard input of grep, and the standard output of grep to the standard input of wc.

With the Bourne shell family, you can combine standard output and standard error and send them both to another program. The syntax is:

command1 2>&1 | command2

where command1 and command2 are commands.

In the following example, both standard output and standard error of the ls command are sent to the sort command:

ls -l file1 file2 2>&1 | sort

With the C-Shell family, the syntax is:

command1 |& command2

For example:

ls -l file1 file2 |& sort

When we talk about pipelines, we often use the word PIPE as a verb, to refer to the sending of data from one program to another. For instance, in the first example, we piped the output of cat to grep, and we piped the output of grep to wc. In the second example, we piped standard output and standard error of ls to sort.

When you think about an example such as the ones above, it's easy to imagine an image of a pipeline: data goes in one end and comes out the other end. However, a better metaphor is to think of an assembly line. The raw data goes in at one end. It is then processed by one program after another until it emerges, in finished form, at the other end.

When you create a pipeline, you must use programs that are written to read text from standard input and write text to standard output. We call such programs "filters", and there are many of them. We will talk about the most important filters in Chapters 16-19. If you are a programmer, you can create your own tools by writing filters of your own.

In practice, you will find that most of your pipelines use only two or three commands in a row. By far, the most common use for a pipeline is to pipe the output of some command to less (see Chapter 21), in order to display the output of the command one screenful at a time. For example, to display a calendar for 2008, you can use:

cal 2008 | less

(The cal program is explained in Chapter 8.)

One of the basic skills in mastering the art of Unix is learning when and how to solve a problem by combining programs into a pipeline. When you create a pipeline, you can use as many filters as you need, and you will sometimes see pipelines consisting of five or six or more programs put together in an ingenious manner. Indeed, when it comes to constructing pipelines, you are limited only by your intelligence and your knowledge of filters(*).

* Footnote

This should give no cause for concern. After you read Chapters 16-19, you will understand how to use the most important filters. Moreover, as one of my readers, you are obviously of above average intelligence.

— hint —

When you use a command that uses a pipe or that redirects standard I/O, it is not necessary to put spaces around the <, > or | characters. However, it is a good idea to use such spaces. For example, instead of:

ls -l a b >output 2>errors
cat f1 f2 f3|grep Harley|wc -l

It is better to use:

ls -l a b > output 2> errors
cat f1 f2 f3 | grep Harley | wc -l

Using spaces in this way minimizes the chances of a typing error and makes your commands easier to understand. This is especially important when you are writing shell scripts.

Jump to top of page

Splitting a Pipeline: tee

There may be times when you want the output of a program to go to two places at once. For example, you may want to send output to both a file and to another program at the same time. To show you what I mean, consider the following example:

cat names1 names2 names3 | grep Harley

The purpose of this pipeline is to display all the lines in the files names1, names2 and names3 that contain the word "Harley". (The details: cat combines the three files; grep extracts all the lines that contain the characters "Harley". These two commands are discussed in Chapters 16 and 19 respectively.)

Let's say you want to save a copy of the combined files. In other words, you want to send the output of cat to a file and you want to send it to grep at the same time.

To do so, you use the tee command. The purpose of tee is to read data from standard input and send a copy of it to both standard output and to a file. The syntax is:

tee [-a] file...

where file is the name of the file where you want to send the data.

Normally, you would use tee with a single file name, for example:

cat names1 names2 names3 | tee masterlist | grep Harley

In this example, the output of cat is saved in a file called masterlist. At the same time, the output is also piped to grep.

When you use tee, you can save more than one copy of the output by specifying more than one file name. For example, in the following pipeline, tee copies the output of cat to two files, d1 and d2:

cat names1 names2 names3 | tee d1 d2 | grep Harley

If the file you name in a tee command does not exist, tee will create it for you. However, you must be careful, because if the file already exists, tee will overwrite it and the original contents will be lost.

If you want tee to append data to the end of a file instead of replacing the file, use the -a (append) option. For example:

cat names1 names2 names3 | tee -a backup | grep Harley

This command saves the output of cat to a file named backup. If backup already exists, nothing will be lost because the output will be appended to the end of the file.

The tee command is especially handy at the end of a pipeline when you want to look at the output of a command and save it to a file at the same time. For example, let's say you want to use the who command (Chapter 8) to display information about the userids that are currently logged in to your system. However, you not only want to display the information, you also want to save it to a file status. One way to do the job is by using two separate commands:

who
who > status

However, by using tee, you can do it all at once:

who | tee status

Pay particular attention to this pattern: I want you to remember it:

command | tee file

Notice that you don't have to use another program after tee. This is because tee sends its output to standard output which, by default, is the screen.

In our example, tee reads the output of who from standard input and writes it to both the file status and to the screen. If you find that the output is too long, you can pipe it to less to display it one screenful at a time:

who | tee status | less

What's in a Name?

tee

In the world of plumbing, a "tee" connector joins two pipes in a straight line, while providing for an additional outlet that diverts water at a right angle. For example, you can use a tee to allow water to flow from left to right, as well as downwards. The actual connector looks like an uppercase "T".

When you use the Unix tee command, you can imagine data flowing from left to right as it moves from one program to another. At the same time, a copy of the data is sent down the stem of the "tee" into a file.

Jump to top of page

The Importance of Pipelines

On October 11, 1964, Doug McIlroy, a Bell Labs researcher wrote a 10-page internal memo in which he offered a number of suggestions and ideas. The last page of the memo contained a summary of his thoughts. It begins:

"To put my strongest concerns into a nutshell:

"We should have some ways of connecting programs like [a] garden hose — screw in another segment when it becomes necessary to massage data in another way..."

In retrospect, we can see that McIlroy was saying that it should be easy to put together programs to solve whatever problem might be at hand. As important as the idea was, it did not bear fruit until well over half a decade later.

By the early 1970s, the original Unix project was well underway at Bell Labs (see Chapter 2). At the time, McIlroy was a manager in the research department in which Unix was born. He was making important contributions to a variety of research areas, including some aspects of Unix. For example, it was McIlroy who demanded that Unix manual pages be short and accurate.

McIlroy had been promoting his ideas regarding the flow of input and output for some time. It wasn't until 1972, however, that Ken Thompson (see Chapter 2) finally added pipelines to Unix. In order to add the pipe facility, Thompson was forced to modify most of the existing programs to change the source of input from files to standard input.

Once this was done and a suitable notation was devised, pipelines became an integral part of Unix, and users became more creative than anyone had expected. According to McIlroy, the morning after the changes were made, "...we had this orgy of one liners. Everybody had a one liner. Look at this, look at that..."

In fact, the implementation of pipelines was the catalyst that gave rise to the Unix philosophy. As McIlroy remembers, "...Everybody started putting forth the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams, because that is a universal interface..."

Today, well over thirty years later, the Unix pipe facility is basically the same as it was in 1972: a remarkable achievement. Indeed, it is pipelines and standard I/O that, in large part, make the Unix command line interface so powerful. For this reason, I encourage you to take the time to learn how to use pipelines well and to practice integrating them into your day-to-day work whenever you get the chance.

To help you start your journey on the Unix version of the yellow-brick road, I have devoted Chapters 16-19 to filters, the raw materials out of which you can fashion ingenious solutions to practical problems.

Before we move on to talk about filters, however, there is one last topic I want to cover: conditional execution.

Jump to top of page

Conditional Execution

There will be times when you will want to execute a command only if a previous command has finished successfully. To do so, use the syntax:

command1 && command2

At other times, you will want to execute a command only if a previous command has not finished successfully. The syntax in this case is:

command1 || command2

This idea — executing a command only if a previous command has succeeded or failed — is called CONDITIONAL EXECUTION.

Conditional execution is mostly used within shell scripts. However, from time to time, it can come in handy when you are entering commands. Here are some examples.

Let's say you have a file named people that contains information about various people. You want to sort the contents of people and save the output to a file named contacts. However, you only want to do so if people contains the name "Harley" somewhere in the file.

To start, how can we see if a file contains the name "Harley"? We use the grep command (see Chapter 19) to display all the lines in the file that contain "Harley". The command is:

grep Harley people

If grep is successful, it will display the lines that contain "Harley" on standard output. If grep fails, it will remain silent. In our case, if grep is successful, we then want to run the command:

sort people > contacts

If grep is unsuccessful, we don't want to do anything.

Here is a command line that uses conditional execution to do the job:

grep Harley people && sort people > contacts

Although this command line works, it leaves us with a tiny problem. If grep finds any lines in the file that meet our criteria, it will display them on the screen. Most of the time this would make sense but, in this case, we don't really want to see any output. All we want to do is run grep and test whether or not it was successful.

The solution is to throw away the output of grep by redirecting it to /dev/null:

grep Harley people > /dev/null && sort people > contacts

Occasionally, you will want to execute a command only if a previous command fails. For example, suppose you want to run a program named update that works on its own for several minutes doing something or other. If update finishes successfully, all is well. If not, you would like to know about it. The following command displays a warning message, but only if update fails:

update || echo "The update program failed."

— hint —

If you ever need to abort a pipeline that is running, just press ^C to send the intr signal (see Chapter 7).

This is a good way to regain control when one of the programs in the pipeline has stopped, because it is waiting for input.

Jump to top of page

Continue to Chapter 16
Filters: Introduction and Basic Operations

Exercises

Review Question #1:

Summarize the Unix philosophy.

Review Question #2:

In Chapter 10, I gave you three questions to ask yourself each time you learn the syntax for a new program:

• What does the command do?
• How do I use the options?
• How do I use the arguments?

Similarly, what are the three questions you should ask (and answer) whenever you start to learn a new program?

Review Question #3:

Collectively, the term "standard I/O" refers to standard input, standard output, and standard error. Define these three terms. What are their abbreviations?

What does it mean to redirect standard I/O? Show how to redirect all three types of standard I/O.

Review Question #4:

What is a pipeline?

What metacharacter do you use to separate the components of a pipeline?

What program would you use at the end of a pipeline to display output one screenful at a time?

Review Question #5:

What program do you use to save a copy of data as it passes through a pipeline?

Applying Your Knowledge #1:

Show how to redirect the standard output of the date command to a file named currentdate.

Applying Your Knowledge #2:

The following pipeline counts the number of userids that are currently logged into the system. (The wc -w command counts words; see Chapter 18.)

users | wc -w

Without changing the output of the pipeline, modify the command to save a copy of the output of users to a file named userlist.

Applying Your Knowledge #3:

The password file (/etc/passwd) contains one line for each userid registered with the system. Create a single pipeline to sort the lines of the password file, save them to a file called userids, and then display the number of userids on the system.

Applying Your Knowledge #4:

In the following pipeline, the find command (explained in Chapter 25) searches all the directories under /etc looking for files owned by userid root. The names of all such files are then written to standard output, one per line. The output of find is piped to wc -l to count the lines:

find /etc -type f -user root -print | wc -l

As find does its work, it will generate various error messages you don't want to see. Your goal is to rewrite the pipeline to throw away the error messages without affecting the rest of the output. Show how to do this for the Bourne Shell family.

For extra credit, see if you can devise a way to do it for the C-Shell family. (Hint: Use a subshell within a subshell.)

For Further Thought #1:

An important part of the Unix philosophy is that, when you need a new tool, it is better to combine existing tools than to write new ones. What happens when you try to apply this guideline to GUI-based tools?

Is that good or bad?

For Further Thought #2:

With the Bourne shell family, it is simple to redirect standard output and standard error separately. This makes it easy to save or discard error messages selectively. With the C-Shell family, separating the two types of output is much more complex. How important is this?

The C-Shell was designed by Bill Joy, a brilliant programmer in his day. Why do you think he created such a complicated system?

For Further Thought #3:

As a general rule, the world of computers changes quickly. Why do you think so many of the basic Unix design principles work so well even though they were created over 30 years ago?

• In the 1970s, the Unix user community was small, allowing developers to experiment and to make changes quickly. Unlike today's operating system programmers, the original developers did not have to concern themselves with a large installed base of unsophisticated users or with powerful corporate interests.

• The synergistic ideas of flexibility, cooperation, and sharing were built into the system from the very beginning. As a result, anyone with a good idea would have a good chance of having his idea incorporated into the operating system.

Jump to top of page

Display the Answers to the Exercises Above

Jump to Chapter 16
Filters: Introduction and Basic Operations

List of Chapters + Appendixes
Table of Contents