Donation?

Harley Hahn
Home Page

Send a Message
to Harley


A Personal Note
from Harley Hahn

Unix Book
Home Page

List of Chapters

Table of Contents

List of Figures

Chapters...
   1   2   3
   4   5   6
   7   8   9
  10  11  12
  13  14  15
  16  17  18
  19  20  21
  22  23  24
  25  26

Glossary

Appendixes...
  A  B  C
  D  E  F
  G  H

Command
Summary...

• Alphabetical
• By category

Unix-Linux
Timeline

Internet
Resources

Errors and
Corrections

Endorsements


INSTRUCTOR
AND STUDENT
MATERIAL...

Home Page
& Overview

Exercises
& Answers

The Unix Model
Curriculum &
Course Outlines

PowerPoint Files
for Teachers

Chapter 21...

Displaying Files

With all the time we spend using computers, it is important to remind ourselves that the main product of our effort is almost always some type of output: text, numbers, graphics, sound, photos, video, or some other data. When you use the Unix command-line programs we discuss in this book, the output is usually text, either displayed on your monitor as it is generated, or saved in a file.

For this reason, Unix has always had a variety of programs that allow you to display textual data, either from the output of a program or from a file. In this chapter, we will discuss the programs you use to display the contents of files. We'll start with text files and then move on to binary files.

Throughout the discussion, my goals for you are twofold. First, whenever you need to display data from a file, I want you to be able to analyze the situation and choose the best program to do the job. Second, regardless of which program you decide to use, I want you to be familiar enough with it to handle most everyday tasks competently.

To start our discussion, I'll take you on a survey of the Unix programs used to display files. I'll introduce you to each program, explain what it does, and explain when to use it. You and I will then discuss each program in turn, at which time I will fill in the details. By far, the most important such program is less. (I'll explain the name later.) For this reason, we will spend the most time on this very useful and practical program.

As we discuss the various programs, we're going to detour a bit to cover two interesting topics. First, I'm going to describe the two different ways in which text-based programs can handle your input, "cooked mode" and "raw mode". Second, I'm going to introduce you to the binary, octal and hexadecimal number systems, concepts that you must understand when you display binary files.

Throughout this chapter, we will discuss how to display "files" even though, strictly speaking, I have not yet explained what a file actually is. In Chapter 23, we will discuss the Unix file system in detail. At that time, I will give you an exact, technical definition of a Unix file. For now, we'll just use the intuitive idea that a file is something with a name that contains information. For example, you might display a file named essay that contains the text of an essay you have written.

 

One more idea before we start: When people talk about "displaying" a file, it refers to displaying the contents of the file. For example, if I write "The following command displays the file essay," it means "The following command displays the contents of the file essay." This is a subtle, but important point, so make sure you understand it.

Jump to top of page

Survey of Programs Used to Display Files

Unix has a variety of programs you can use to display files. In this section, we'll survey the programs, so you will have an overall view of what's available. Later in the chapter, we'll talk about each program in detail.

To start, there are programs whose only purpose is to display textual data one screenful at a time. Such a program is called a PAGER. The name comes from the fact that, in the early days of Unix, users had terminals that printed output on paper. Thus, to look at a file, you would print it on paper, one page a time. Nowadays, of course, to look at a file, you display it on your monitor one screenful at a time. Still, the programs that do the job are called "pagers".

In general, there are two ways to use a pager. First, as we discussed in Chapter 15, you can use a pager at the end of a pipeline to display output from another program. We have seen many such examples in previous chapters, for example:

cat newnames oldnames | grep Harley | sort | less
colrm 14 30 < students | less

In the first pipeline, we combine the contents of two files, grep the data for all the lines that contain the string "Harley", and send the results to less to be displayed. In the second example, we read data from a file, remove columns 14 through 30 from each line of data and, again, send the results to less to be displayed.

The other way to use a pager is to have it display the contents of a file, one screenful at a time. For example, the following command uses less to examine the contents of the Unix password file (described in Chapter 11):

less /etc/passwd

You can look at any text file in this manner, simply by typing less followed by the name of the file. (We'll discuss options, syntax and other details later in the chapter.)

Although less is the principal Unix pager, there are two other such programs you may hear about, more and pg. You will remember from our discussion in Chapter 2 that, in the 1980s, there were two main branches of Unix: System V developed at AT&T, and BSD developed at U.C. Berkeley. The pg program was the default System V pager; more was the default BSD pager. Today, both of these programs are obsolete having been replaced by less.

On rare occasions, you may have to use more. For this reason, we will talk about it a bit, so if you ever encounter it, you'll know what to do. The pg program, for the most part, is gone and buried, and there is no need for us to discuss it. I only mention it here for historical reasons: if you see the name, you'll at least know what it is.

Aside from using a pager, you can also display a file by using the cat program. As we discussed in Chapter 16, the principal use of cat is to combine the contents of multiple files. However, cat can also be used to display a file quickly, for example:

cat /etc/passwd

Since cat displays the entire file at once (not one screenful at a time), you would use it only when a file is short enough to fit on your screen without scrolling. Most of the time, it makes more sense to use less.

In most cases, you use less or cat when you want to look at an entire file. If you want to display only part of a file, there are three other programs you can use: head, to display the beginning of a file; tail, to display the end of a file; and grep, to display all the lines that contain (or don't contain) a particular pattern.

In Chapter 16, we discussed how to use head and tail as filters within a pipeline. In this chapter, I'll show you how to use them with files. In Chapter 19, we talked about grep in detail, and in Chapter 20, I showed you a lot of examples. For this reason, we won't need to discuss grep in this chapter. (However, I do want to mention it.)

The next group of programs you can use to display files are the text editors. A text editor allows you to look at any part of a file, search for patterns, move back and forth within the file, and so on. It also allows you to edit (make changes to) the file. Thus, you use a text editor to display a file when you want to make changes at the same time, or when you want to use special editor commands to move around the file. Otherwise you would use a pager.

In Chapter 14, I mentioned several text editors that are widely available on Unix and Linux systems: kedit, gedit, Pico, Nano, vi and Emacs. Any of these editors will allow you to display and change files. However, vi and Emacs are, by far, the most powerful tools (and the hardest to learn). The only editor we will talk about in detail in this book is vi, which we will discuss in Chapter 22.

From time to time, you may want to use a text editor to examine a file that is so important you don't want to make any changes accidentally. To do so, you can run the editor in what is called "read-only" mode, which means you can look at the file, but you cannot make any changes.

To start vi in read-only mode, you use the -R option. For example, any user can look at the Unix password file (Chapter 11), but you are not allowed to modify it unless you are superuser. Thus, to use vi to look at the password file without being able to edit it, you would use:

vi -R /etc/passwd

As a convenience, you can use view as a synonym for vi -R:

view /etc/passwd

Even if you are logged in as superuser, you will often choose to use vi -R or view to look at a very important system file. This ensures that you don't change it accidentally. (We'll discuss this more in Chapter 22.)

The programs we have discussed so far all work with text files, that is, files that contain lines of characters. However, there are many different types of non-text files, called binary files, and from time to time, you may need to look inside such files. The final two programs I want to mention — hexdump and od — are used to display files that contain binary data.

For example, say you are writing a program that sends binary output to a file. Each time you run the program, you need to look inside the file to check on the output. That is where hexdump or od come in handy. We'll talk about the details later in the chapter. As a quick example, either of the following Linux commands lets you look inside the file containing the grep program. (Don't worry about the options for now. We'll discuss them later.)

hexdump -C /bin/grep | less
od -Ax -tx1z /bin/grep | less

For reference, Figure 21-1 contains a summary of the programs we have discussed in our survey. As you look at the summary, please take a moment to appreciate how many Unix tools there are to display files, each with its own characteristics and uses.

Figure 21-1: Programs to display files

Unix and Linux have a large variety of tools you can use to display all or part of a file. This summary shows the most important such tools, along with the chapters in which they are discussed. At the very least, you should be competent with less, cat , head, tail, and grep to display text files. You should also know how to use vi, as it is the principal Unix text editor. If you are a programmer, you should be familiar with either hexdump or od, so you can display binary files.

Program Purpose Chapter
lessPager: display one screenful at a time21
morePager (obsolete, used with BSD)21
pgPager (obsolete, used with System V)
catDisplay entire file, no paging16
headDisplay first part of file16, 21
tailDisplay last part of file16, 21
grepDisplay lines containing/not containing specific pattern19, 20
viText editor: display and edit file22
view, vi -RRead-only text editor: display but don't allow changes to file22
hexdumpDisplay binary (non-text) files21
odDisplay binary (non-text) files21

Jump to top of page

Introduction to less:
Starting, Stopping, Help

The less program is a pager. That is, it displays data, one screenful at a time. When you start less, there are many options to choose from and, once it is running, there are many commands you can use. However, you will rarely need so much complexity. In this chapter, we will concentrate on the basic options and features you are likely to use on a day-to-day basis. For a description of the more esoteric options and commands we won't be covering, see the manual page and the Info page:

man less
info less

(The online manual and the Info system are discussed in Chapter 9.)

The basic syntax to use less is as follows:

less [-cCEFmMsX ] [+command] [-xtab] [file...]

where command is a command for less to execute automatically; tab is the tab spacing you want to use; and file is the name of a file.

Most of the time, you will not need any options. All you will do is specify one or more files to display, for example:

less information
less names addresses

You can use less to display any text file you have permission to read, including a system file or a file belonging to another userid. (We discuss file permissions in Chapter 25.) As an example, the following command displays a well-known system file, the Termcap file we discussed in Chapter 7:

less /etc/termcap

The Termcap file contains technical descriptions of all the different types of terminals. Although Termcap has been mostly replaced(*) by a newer system called Terminfo (see Chapter 7), this file is an excellent example to use when practicing with less, so if you want to follow along as you read this chapter, you can enter the above command whenever you want.

* Footnote

Although the Terminfo system is preferred (see Chapter 7), Termcap is still used by some programs, including less itself.

Before displaying anything, less will clear the screen. (You can suppress this by using the -X option.) When less starts, it displays the first screenful of data, whatever fits on your monitor or within your window. At the bottom left-hand corner of the screen, you will see a prompt. The initial prompt shows you the name of the file being displayed. Depending on how your system is configured, you may also see other information. For example:

/etc/termcap lines 1-33/18956 0%

In this case, we are looking at lines 1 through 33 of the file /etc/termcap. The top line on the screen is the first line of the file (0%). Subsequent prompts will update the line numbers and percentage.

On some systems, the default is for less to display a much simpler prompt without the line numbers and percentage. If this is the case on your system, the first prompt will show only the file name, for example:

/etc/termcap

Subsequent prompts will be even simpler; all you will see is a colon:

:

On such systems, you can display extra information in the prompt by using the -M option (discussed later in the chapter.)

— hint —

For ambitious fanatics with a lot of extra time, less offers more flexibility for customizing the prompt than any other pager in the history of the world. (See the man page for details.)

Once you see the prompt, you can enter a command. In a moment, we'll talk about the various commands, of which there are many. For now, I'll just mention the most common command, which is simply to press the <Space> bar. This tells less to display the next screenful of data. Thus, you can read an entire file, one screenful at a time, from beginning to end, simply by pressing <Space>.

When you reach the end of the file, less changes the prompt to:

(END)

If you want to quit, you can press q at any time. You don't have to wait until the end of the file. Thus, to look at a file, all you need to do is start less, press <Space> until you see as much as you want, and then press q to quit.

As a quick exercise, try this. Enter one of the following commands to display the Termcap file:

less /etc/termcap
less -m /etc/termcap

You will see the first screenful of data. Press <Space> a few times, moving down through the file, one screenful at a time. When you get tired of looking at an endless list of cryptic, obsolete terminal descriptions, press q to quit.

— hint —

When you use less to display a file, there are many commands you can use while you are looking at a file. The most important command is h (help). At any time, you can press h to display a summary of all the commands.

The best way to learn about less is to press h, see what is available and experiment.

Jump to top of page

The Story of less and more

As I explained earlier in the chapter, the original Unix pagers were more (used with BSD) and pg (used with System V). You will sometimes hear that the name less was chosen as a wry joke. Since less is much more powerful than more, the joke is that "less is more". It's plausible, but not true. Here is the real story.

The original Unix pager, more, was a simple program used to display data one screenful at a time. The name more came from the fact that, after each screenful, the program would display a prompt with the word "More":

--More--

The more program was useful, but it had serious limitations. The most important limitation was that more could only display data from start to finish: it could not back up.

In 1983, a programmer named Mark Nudelman was working at a company called Integrated Office Systems. The company produced Unix software that could create very large log files containing transaction information and error messages. Some of the files were so large that the current version of the vi text editor was not able to read them. Thus, Nudelman and the other programmers were forced to use more to examine the files when they wanted to look for errors.

However, there was a problem. Whenever a programmer found an error message in a log file, there was no way to back up to see what caused the problem, that is, the transactions immediately preceding the error. The programmers often complained about this problem. As Nudelman explained to me:

"A group of engineers were standing around a terminal in the lab using more to look at a log file. We found a line indicating an error and, as usual, we had to determine the line number of the error. Then we had to quit more, restart it, and move forward to a point several lines before the error to see what led up to the error. Someone complained about this cumbersome process. Someone else said "We need a backwards more." A third person said "Yeah, we need LESS!" which got a chuckle from everyone.

Thinking about the problem, it occurred to Nudelman that it wouldn't be too hard to create a pager that could back up. In late 1983, he wrote such a program, which he indeed called less. At first, less was used only within the company. However, after enhancing the program, Nudelman felt comfortable making it available to the outside world, which he did in May 1985.

Nudelman released less as open source software, which enabled many other people to help him improve the program. Over the years, less became more and more powerful and so popular with Unix users that it eventually reached the point where it replaced both more and pg (the other popular Unix pager). Today, less is the most widely used Unix pager in the world and is distributed as part of the GNU utilities (see Chapter 2).

Interesting note: Most programs are released with version numbers such as 1.0, 1.01, 1.2, 2.0 and so on. Nudelman used a simpler system. From the very beginning, he gave each new version of less its own number: 1, 2, 3, 4 and so on. Thus, as I write this, the less program I am using is version 394.

Jump to top of page

Using less

As you read a file with less, there are a great many commands you can use. For reference, the most important commands are summarized in Figure 21-2. For a more comprehensive summary, you can press h (help) from within less, or you can enter the following command from the shell prompt:

less --help

Figure 21-2
less: Summary of the Most
Useful Commands
Basic commands
hdisplay help information
<Space>go forward one screenful
qquit the program
Advanced commands
ggo to first line
Ggo to last line
=display current line number and file name
<Return>go forward one line
n<Return>go forward n lines
bgo backward one screenful
ygo backward one line
nygo backward n lines
dgo forward (down) a half screenful
ugo backward (up) a half screenful
<Down>go forward one line
<Up>go forward one line
<PageUp>go backward (up) one screenful
<PageDown>go forward (down) one screenful
nggo to line n
npgo to line n% through the file
/patternsearch forward for the specified pattern
?patternsearch backward for the specified pattern
nrepeat search: same direction
Nrepeat search: opposite direction
!commandexecute the specified shell command
vstart vi editor using current file
-optionchange specified option
_optiondisplay current value of option
 

When you display the comprehensive summary, you will see there are many more commands than you will ever need. For example, there are five different ways to move forward (that is, down) by one line, five different ways to move backward (up) by one line, five different ways to quit the program, and so on. Don't be intimidated. You only need to know the commands in Figure 21-2.

The best strategy is to start with the three commands I mentioned earlier: <Space> to page through the file, h for help, and q to quit. Once you feel comfortable with these three, teach yourself the rest of the commands in Figure 21-2, one at a time, until you have memorized them all. Just work your way down the list from top to bottom. (I chose the order carefully and, yes, you do need to memorize them all.)

If you need a file on which to practice, use the Termcap file I mentioned earlier in the chapter. The following command will get you started:

less -m /etc/termcap

(It helps to use the -m option when you are practicing, so the prompt will show your position in the file.)

Jump to top of page

Using less to Search Within a File

Most of the commands in Figure 21-2 are straightforward. However, I do want to say a few words about the search commands. When you want to search for a pattern, you use either / (search forward) or ? (search backward), followed by a pattern. The pattern can be a simple character string or a regular expression (described in Chapter 20). After typing / or ?, followed by the pattern, you need to press <Return> to let less know you are finished.

Here are some examples. To search forward in the file for the next occurrence of "buffer", use:

/buffer

To search backward for the same pattern, use:

?buffer

Searches are case sensitive, so you will get different results if you search for "Buffer":

/Buffer

If you want to use case insensitive searching, you can start less with the -I option (described later in the chapter), for example:

less -Im /etc/termcap

When you start less in this way, searching for "buffer" would produce the same result as searching for "Buffer" or "BUFFER".

If you want to turn the -I option off and on while you are reading a file, you can use the -I command from within less. To display the current state of this option, use the _I command. (Changing and displaying options from within less is described later in the chapter.)

If you want to perform more searches, you can use regular expressions. For example, let's say you want to search for any string that contains "buf", followed by zero or more lowercase letters. You can use:

/buf[:lower:]*
?buf[:lower:]*

(For a detailed explanation of regular expressions, including many examples, see Chapter 20.)

Once you have entered a search command, you can repeat it by using the n (next) command. This performs the exact same search in the same direction. To repeat the same search in the opposite direction, use N.

Whenever you search for a pattern, less will highlight that pattern wherever it appears in the file. Thus, once you search for something, it is easy to see all such occurrences as you page through the file. The highlighting will persist until you enter another search.

— hint —

Once you have learned how to use the vi editor, the less commands will make more sense since many of the commands are taken directly from vi. This is because virtually all experienced Unix people are familiar with vi, so using the same commands with less makes a lot of sense.

— hint —

If you have too much spare time on your hands, you can use the lesskey command to change the keys used by less. For details, see the lesskey man page.

Jump to top of page

Raw and Cooked Mode

Before we continue our discussion, I want to take a moment to talk about a few important I/O (input/output) concepts that will help you better understand how less and similar programs work. Let's start with a definition.

A DEVICE DRIVER or, more simply, a DRIVER, is a program that provides an interface between the operating system and a particular type of device, usually some type of hardware. When you use the Unix text-based CLI (command line interface), the driver that controls your terminal is called a TERMINAL DRIVER.

Unlike some other drivers, terminal drivers must provide for an interactive user interface, which requires special preprocessing and postprocessing of the data. To meet this need, terminal drivers use what is called a LINE DISCIPLINE.

Unix has two main line disciplines, CANONICAL MODE and RAW MODE. The details are horribly technical, but the basic idea is that, in canonical mode, the characters you type are accumulated in a buffer (storage area), and nothing is sent to the program until you press the <Return> key. In raw mode (also known as NONCANONICAL MODE), each character is passed directly to the program as soon as you press a key. When you read Unix documentation, you will often see canonical mode referred to as COOKED MODE. (This is, of course, an amusing metaphor, "cooked" being the opposite of "raw".)

When a programmer creates a program, he can use whichever line discipline he wants. Raw mode gives the programmer full control over the user's working environment. The less program, for example, works in raw mode, which allows it to take over the command line and the screen completely, displaying lines and processing characters according to its needs.

This is why whenever you press a key, less is able to respond instantly; it does not need you to press <Return>. Thus, you simply press the <Space> bar and less displays more data; you press b and less moves backwards in the file; you press q and the program quits. You will find many other programs that work in raw mode, such as the text editors vi and Emacs.

In canonical (cooked) mode, a program is sent whole lines, not individual characters. This releases the programmer from having to process each character as it is generated. It also allows you to make changes in the line before the line is processed. For example, you can use <Backspace> or <Delete> to make corrections before you press <Return>. When you use the shell, for example, you are working in canonical mode: nothing is sent until you press <Return>.

Virtually all interactive text-based programs use either canonical mode or raw mode. However, there is a third line discipline you may hear about, even though it is not used much anymore.

CBREAK MODE is a variation of raw mode. Most input is sent directly to the program, just like raw mode. However, a few very important keys are handled directly by the terminal driver. These keys (which we discussed in Chapter 7) are the ones that send the five special signals: intr (^C), quit (^\), susp (^Z), stop (^S), and start (^Q). Cbreak mode, then, is mostly raw with a bit of cooking. In the olden days, it was sometimes referred to whimsically, as "rare mode".

Jump to top of page

Options to Use With less

When you start less, there are a large number of options you can use, most of which can be safely ignored(*). For practical purposes, you can consider less to have the following syntax:

less [-cCEFmMsX ] [+command] [-xtab] [file...]

where command is a command for less to execute automatically, tab is the tab spacing you want to use, and file is the name of a file.

* Footnote

In fact, less is one of those odd commands, like ls (see Chapter 24), that has more options than there are letters in the alphabet. It's hard to explain why, but I suspect it has something to do with an overactive thyroid.

The three most useful options are -s, -c and -m. The -s (squeeze) option replaces multiple blank lines with a single blank line. This is useful for condensing output in which multiple blank lines are not meaningful. Of course, no changes are made to the original file.

The -c (clear) option tells less to display each new screenful of data from the top down. Without -c, new lines scroll up from the bottom line of the screen. Some people find that long files are easier to read with -c. The -C (uppercase "C") option is similar to -c except that the entire screen is cleared before new data is written. You will have to try both options for yourself and see what you prefer.

The name -m refers to more, the original BSD pager I mentioned earlier. The more prompt displays a percentage, showing how far down the user is in the file. When less was developed, it was given a very simple prompt, a colon. However, the - m option was included for people who were used to more and wanted the more verbose prompt.

The -m option makes the prompt look like the more prompt by showing the percentage of the file that has been displayed. For example, let's say you display the Termcap file (see the discussion earlier in the chapter) using -m:

less -m /etc/termcap

After moving down a certain distance you see the prompt:

40%

This indicates you are now 40 percent of the way through the file. (By the way, you can jump directly to this location by using the command 40p. See Figure 21-2.)

The -M (uppercase "M") option makes the prompt show even more information: you will see the name of the file and the line number, as well as the percentage that has been displayed. For example, if you use:

less -M /etc/termcap

A typical prompt would look like this:

/etc/termcap lines 7532-7572/18956 40%

The line numbers refer to the range of lines being displayed, in this case, lines 7,532 to 7,572 (out of 18,956).

One of my favorite options is -E (end). This tells less to quit automatically when the end of the file has been displayed. When you use -E, you don't have to press q to quit the program. This is convenient when you know you only want to read through a file once without looking backward.

The -F (finish automatically) option tells less to quit automatically if the entire file can be displayed at once. Again, this keeps you from having to press q to quit the program. In my experience, -F works best with very short files, while -E works best with long files. To see how this works, let's create a very short file named friends. To start, enter the command:

cat > friends

Now type the names of five or six friends, one per line. When you are finished, press ^D to send the eof signal to end the program. (We discuss ^D in Chapter 7.) Now, compare the following two commands. Notice that with the second command, you don't have to press q to quit the program.

less friends
less -F friends

The + (plus sign) option allows you to specify where less will start to display data. Whatever appears after the + will be executed as an initial command. For example, to display the Termcap file with the initial position at the end of the file, use:

less +G /etc/termcap

To display the same file, starting with a search for the word "buffer", use:

less +/buffer /etc/termcap

To start at a particular line, use +g (go to) preceeded by the line number. For example, to start at line 37, use:

less +37g /etc/termcap

As a convenience, less allows you to leave out the g. Thus, the following two commands both start at line 37:

less +37g /etc/termcap
less +37 /etc/termcap

The -I (ignore case) option tells less to ignore differences in upper- and lowercase when you search for patterns. By default, less is case sensitive. For example, "the" is not the same as "The". However, when you use -I, you get the same results searching for "the", "The" or "THE".

The -N (number) option is useful when you want to see line numbers in the output. When you use this option, less numbers each line, much like the nl command (Chapter 18). For instance, the following two examples generate similar output:

less -N file
nl file | less

In both cases, of course, the actual file is not changed.

There are two important differences between using nl and less -N. First, less numbers lines in only one way: 1, 2, 3 and so on. The nl command has a variety of options that allow you a great deal of flexibility in how the line numbers should be generated. You can choose the starting number, the increment, and so on (see Chapter 18). Second, less numbers all lines, even blank ones. By default, nl does not number blank lines unless you use the -b a option.

Finally, the -x option followed by a number tells less to set the tabs at the specified regular interval. This controls the spacing for data that contains tab characters. For example, to display a program named foo.c with the tabs set to every 4 spaces, use:

less -x4 foo.c

As with most Unix programs, the default tab setting is every 8 spaces (see Chapter 18).

If you want to change an option on the fly while you are viewing a file, use the - (hyphen) command followed by the new option. This acts like an on/off toggle switch. For example, to turn on the -M option (to display a verbose prompt) while you are looking at a file, type:

-M

To turn off the option, just enter the command again.

To display the current value of an option, use an _ (underscore) followed by the option. For example, to check how the prompt is set, use:

_M

Here is another example. You have started less without the -I option and, as you are looking at the file, you decide you want to do a case insensitive search. All you need to do is type:

-I

After entering your search command, you type -I again to turn off the option. If you do this a few times, it's easy to lose track. So, at any time, you can check the status of the option, by typing:

_I

This is a handy pattern to remember.

— hint —

When you are new to less and you want to learn how to use the various options, you can use the - (change option) and _ (display option) commands to experiment while you are displaying a file.

This is especially useful if you want to learn how to use the -P option (which we did not discuss) to change the prompt. You can make a change to the prompt, and see the result immediately.

Jump to top of page

When to Use less
and When to Use cat

As we discussed earlier in the chapter, you can use both less and cat to display files. With less, the file is displayed one screenful at a time; with cat the entire file is displayed all at once. If you expect a file to be longer than the size of your screen, it is best to use less. If you use cat, most of the file will scroll off your screen faster than you can read it. However, what if the file is small?

If you use cat to display a small file — one that is short enough to fit on your screen — the data is displayed quickly, and that is that. Using less is inconvenient for two reasons. First, less will clear the screen, erasing any previous output. Second, you will have to press q to quit the program, which is irritating when all you want to do is display a few lines quickly.

You can, of course, use less with the -F (finish automatically) option, which causes the program to quit automatically if the entire file can be displayed at once. For example, let's say that data is a very small file. You can display it quickly by using the command:

less -F data

In fact, you can even specify that less should use the -F option by default. (You do this by setting the LESS environment variable, explained later in the chapter.) Once you set this variable, you won't have to type -F, and the following two commands are more or less equivalent (assuming data is a very small file):

less data
cat data

However, if you watch experienced Unix people, you will see that they always use cat to display short files; they never use less. Why is this?

There are four reasons. First, as I mentioned, less clears the screen, which erases the previous output. This can be inconvenient. Second it is faster to type "cat" than "less". Third, the name cat is a lot cuter than the name less. Finally, using cat in this way is how Unix people distinguish themselves from the crowd.

These might seem like insignificant reasons, but Unix people like their work to be smooth, fast and fun. So if you want to look like a real Unix person and not a clueless goober, use cat when the file is very small, and less otherwise.

Jump to top of page

Using Environment Variables
to Customize Your Pager

As we discussed in Chapter 15, the Unix philosophy says that each tool should do only one thing and do it well. Thus, the Unix pagers (less, more, pg) were all designed to provide only one service: to display data one screenful at a time. If another program requires this functionality, the program does not have to provide the service itself. Instead, it uses a pager.

The most common example occurs when you use the man program (Chapter 9) to access the online Unix manual. The man program does not actually display the text of the page. Rather, it calls upon a pager to show you the page, one screenful at a time.

The question arises, which pager will man and other programs use? You might think that, because less is the most popular pager, any program that needs such a tool will automatically use less. This is often the case, but not always. On some systems, for example, the man program, by default, will use more to display man pages. This can be irritating because, as we discussed earlier, more is not nearly as powerful as less.

However, you can specify your default pager. All you have to do is set an environment variable named PAGER to the name of the pager you want to use. For example, the following commands set less as your default pager. The first command is for the Bourne Shell family (Bash, Korn Shell). The second command is for the C-Shell family (C-Shell, Tcsh).

export PAGER=less
setenv PAGER less

To make the change permanent, all you need to do is put the appropriate command in your login file. (Environment variables are discussed in Chapter 12; the login file is discussed in Chapter 14.)

Once you set the PAGER environment variable in this way, all programs that require an external pager will use less. Even if less already seems to be the preferred pager for your system, it is a good idea to set PAGER in your login file. This will override any other defaults explicitly, ensuring that no matter how your system happens to be set up, you will be in control.

Aside from PAGER, there is another environment variable you can use for further customization. You can set the variable LESS to the options you want to use every time the program starts. For example, let's say you always want to use less with the options -CFMs (discussed earlier in the chapter). The following commands set the variable LESS appropriately. (The first command is for the Bourne Shell family. The second command is for the C-Shell family.)

export LESS='-CFMs'
setenv LESS '-CFMs'

Again, this is a command to put in your login file. Once you do, less will always start with these particular options. This will be the case whether you run less yourself or whether another program, such as man runs it on your behalf.

If you ever find yourself using the more program (say, on a system that does not have less), you can specify automatic options in the same way by setting the MORE environment variable. For example, the following commands specify that more should always start with the -cs options. (The first command is for the Bourne Shell family; the second is for the C-Shell family.)

export MORE='-cs'
setenv MORE '-cs'

Once again, all you need to do is put the appropriate command in your login file to make your preferences permanent.

— hint —

The less program actually looks at 30 different environment variables, which allows for an enormous amount of flexibility.

The most important variable is LESS, the one variable we have discussed. If you are curious as to what the other variables do, take a look at the less man page.

Jump to top of page

Displaying Multiple Files With less

Anything you can do with less using a single file, you can also do with multiple files. In particular, you can move back and forth from one file to another, and you can search for patterns in more than one file at the same time. For reference, Figure 21-3 contains a summary of the relevant commands.

Figure 21-3
less: Commands to Use
With Multiple Files
:nchange to next file in list
:pchange to previous file in list
:xchange to first file in list
:einsert a new file into the list
:ddelete current file from the list
:fdisplay name of current file (same as =)
=display name of current file
/*patternsearch forward for specified pattern
?*patternsearch backward for specified pattern
 

To work with multiple files, all you have to do is specify more than one file name on the command line. For example, the following command tells less that you want to work with three different files:

less data example memo

At any time, you see only one file, which we call the CURRENT FILE. However, less maintains a list of all your files and, whenever you want, you can move from one file to another. You can also add files to the list or delete files from the list as the need arises.

When less starts, the current file is the first one in the list. In the above example, the current file would be data. To move forward within the list, you use the :n (next) command. For example, if you are reading data and you type :n, you will change to example, which will become the new current file. If you type :n again, the current file will change to memo.

Similarly, you can move backwards within the list by using the :p (previous) command, and you can jump to the beginning of the list by using the :x command. For example, if you are reading memo and you type :p, the current file will change to example. If you type :x instead, the current file will change to data.

To display the name of the current file, type :f. This is a synonym for the = command (see Figure 21-2). At this time, you may want to take a moment and practice these three commands before you move on.)

One of the most powerful features of less is that it allows you to search for a pattern in more than one file. Here is how it works.

As we discussed earlier in the chapter, you use the / command to search forwards within a file and ? to search backwards. After using either of these commands, you can type n to search again in the same direction or N to search again in the opposite direction.

For example, let's say you enter the command:

/buffer

This performs a forward search within the current file for the string "buffer". Once less has found it, you can jump forward to the next occurrence of "buffer" by typing n. Or, you can jump backward to the previous occurrence by typing N.

When you are working with more than one file, you have an option: instead of using / or ?, you can use /* or ?*. When you search in this way, less treats the entire list as if it were one large file. For example, let's say you start less with the command above:

less data example memo

The current file is data, and the list of files is:

data example memo

You type the command :n, which moves you within the list to the second file, example. You then type 50p (50 percent), which moves you to the middle of example. You now enter the following command to search forward for the string "buffer".

/*buffer

This command starts from the current position in example and searches forward for "buffer". Once the search is complete, you can press n to repeat the search moving forward. If you press n repeatedly, less would normally stop at the end of the file. However, since you used /* instead of /, less will move to the next file in the list automatically (in this case, memo) and continue the search.

Similarly, if you press N repeatedly to search backwards, when less gets to the beginning of the current file (example), it will move to the end of the previous file automatically (data) and continue the search.

The same idea applies when you use ?* instead of ? to perform a backwards search. The * tells less to ignore file boundaries when you use n or N commands.

In addition to :n, :p, :x, /* and ?*, less has two more commands to help you work with multiple files. These commands allow you to insert and delete files from the list.

To insert a file, you type :e (examine) followed by one or more file names. The new files will be inserted into the list directly after the current file. The first such file will then become the new current file. For example, let's say the list of files is:

data example memo

The current file happens to be example. You enter the following command to insert three files into the list:

:e a1 a2 a3

The list becomes:

data example a1 a2 a3 memo

The current file is now a1.

To delete the current file from the list, you use the :d (delete) command. (Of course, less does not delete the actual file.) For example, if you are working with the list above, and you type :d, the current file (a1) is deleted from the list:

data example a2 a3 memo

The previous file (example) becomes the new current file.

At first, these commands can be a bit confusing, especially because there is no way to display the actual list so you can see what's what. When you work with multiple files, you'll need to keep the sequence of files in your head. Still, when you want to display more than one file or search through more than one file, you will find these commands to be surprisingly practical, so they are worth learning.

Jump to top of page

Displaying a File Using more

As we discussed earlier in the chapter, the early pagers more and pg have been replaced by the more powerful program less. Although you will probably never see pg, you will run into more from time to time. For example, you may have to use a system that does not have less, and you will have to use more. Or you may use a system in which the default pager is more, and you may find yourself using it accidentally.(*) In such cases, it behooves you to know a bit about the program so, in this section, I'll go over the basics for you.

* Footnote

This is the case on Solaris systems. When you use the man command to display a man page, the default pager is more. If you use such a system regularly, you can make less your default pager by setting the PAGER environment variable. See the discussion earlier in the chapter.

The syntax for more is:

more [-cs] [file...]

where file is the name of a file.

The more program displays data one screenful at a time. After each screen is written, you will see a prompt at the bottom left corner of the screen. The prompt looks like this:

--More--(40%)

(Hence the name more.)

At the end of the prompt is a number in parentheses. This shows you how much of the data has been displayed. In our example, the prompt shows that you are 40 percent of the way through the file.

The simplest way to use more is to specify a single file name. For example:

more filename

If the data fits on a single screen, it will be displayed all at once and more will quit automatically. Otherwise, the data will be displayed, one screenful at a time, with the prompt at the bottom.

Once you see the prompt, you can enter a command. The most common command is simply to press the <Space> bar, which displays the next screenful of data. You can press <Space> repeatedly to page through the entire file. After displaying the last screenful of data, more will quit automatically.

The most common use for more is to display the output of a pipeline, for example:

cat newnames oldnames | grep Harley | sort | more
ls -l | more

When you use more in a pipeline, the prompt will not show the percentage:

--More--

This is because more displays the data as it arrives, so it has no idea how much there will be.

When more pauses, there are a variety of commands you can use. Like less, more works in raw mode (explained earlier in the chapter), so when you type single-character commands, you do not have to press <Return>. As you might expect, the most important command is h (help), which displays a comprehensive command summary.

For the most part, you can think of more as a less powerful version of less. For reference, the most important more commands are summarized in Figure 21-4. For a comprehensive summary, see the more man page (man more).

Figure 21-4
more: Useful Commands
Basic commands
hdisplay help information
<Space>go forward one screenful
qquit the program
Advanced commands
=display current line number
<Return>go forward one line
dgo forward (down) a half screenful
fgo forward one screenful
bgo backward one screenful
/patternsearch forward for the specified pattern
/repeat last search
!commandexecute the specified shell command
vstart vi editor using current file
 

As I mentioned, you can move forward one screenful by pressing <Space>. Alternatively, you can press d (down) to move forward a half screenful, or <Return> to move forward one line. To move backward one screenful, press b. (Note: The b command only works when you are reading a file. Within a pipeline, you can't go backwards because more does not save the data.)

To search for a pattern, type / followed by the pattern, followed by <Return>. If you want, you can use a regular expression (see Chapter 20). When more finds the pattern, it will display two lines before that location so you can see the line in context. To repeat the previous search, enter / without a pattern, that is, /<Return>.

When you start more, the two most useful options are -s and -c. The -s (squeeze) option replaces multiple blank lines with a single blank line. You can use this option to condense output in which multiple blank lines are not meaningful. Of course, this does not affect the original file.

The -c (clear) option tells more to display each new screenful of data from the top down. Each line is cleared before it is replaced. Without -c, new lines scroll up from the bottom line of the screen. Some people find that long files are easier to read with -c. You will have to try it for yourself.

Jump to top of page

Displaying the Beginning of a File: head

In Chapter 16, we discussed how to use head as a filter within a pipeline to select lines from the beginning of a stream of data. In this section, I'll show you how to use head on its own, to display the beginning of a file. When you use head in this way, the syntax is:

head [-n lines] [file...]

where lines is the number of lines you want to display, and file is the name of a file. By default, head will display the first 10 lines of a file. This is useful when you want to get a quick look at a file to check its contents. For example, to display the first 10 lines of a file named information, use:

head information

If you want to display a different number of lines, specify that number using the -n option. For example, to display the first 20 lines of the same file, use:

head -n 20 information

— hint —

Originally, head and tail (discussed next) did not require you to use the -n option; you could simply type a hyphen followed by a number. For example, the following commands all display 15 lines of output:

calculate | head -n 15
calculate | head -15

calculate | tail -n 15
calculate | tail -15

Officially, modern versions of head and tail are supposed to require the -n option, which is why I have included it. However, most versions of Unix and Linux will accept both types of syntax and, if you watch experienced Unix people, you will find that they often leave out the -n.

Jump to top of page

Displaying the End of a File: tail

To display the end of a file, you use the tail command. The syntax is:

tail [-n [+]lines] [file...]

where lines is the number of lines you want to display, and file is the name of a file.

By default, tail will display the last 10 lines of a file. For example, to display the last 10 lines of a file named information, use:

tail information

To display a different number of lines, use the -n option followed by a number. For example, to display the last 20 lines of the file information, use:

tail -n 20 information

Strictly speaking, you must type the -n option. However, as I explained in the previous section, you can usually get away with leaving it out. Instead, you can simply type a - (hyphen) followed by the number of lines you want to display. Thus, the following two lines are equivalent:

tail -n 20 information
tail -20 information

If you put a + (plus sign) character before the number, tail displays from that line number to the end of the file. For example, to display from line 35 to the end of the file, use:

tail -n +35 information.

In this case, don't leave out the -n to ensure that tail does not interpret the number as a file name.

Jump to top of page

Watching the End of a Growing File: tail -f

The tail command has a special option that allows you to watch a file grow, line by line. This option comes in handy when you must wait for data to be written to a file. For example, you might want to monitor a program that writes one line at a time to the end of a file. Or, if you are a system administrator, you might want to keep an eye on a log file to which important messages are written from time to time.

To run tail in this way, you use the -f option. The syntax is:

tail -f [-n [+]lines] [file...]

where lines is the number of lines you want to display, and file is the name of a file. (The lines argument is described in the previous section.)

The -f option tells tail not to stop when it reaches the end of the file. Instead, tail waits indefinitely and displays more output as the file grows. (The name -f stands for "follow".)

For example, let's say that over the next few minutes, a particular program will be adding output to the end of a file named results. You want to follow the progress of this program. Enter:

tail -f results

As soon as you enter the command, tail will display the last 10 lines of the file. The program will then wait, monitoring the file for new data. As new lines are added, tail will display them for you automatically.

When you use tail -f, it waits for new input indefinitely; the program will not stop by itself. To stop it, you must press ^C (the intr key; see Chapter 7). This can present a small problem because, until you stop tail, you won't be able to enter more commands. There are two ways to handle the situation.

First, you can run tail -f in the background (see Chapter 26) by using an & (ampersand) character at the end of the command.

tail -f results &

When you run tail in the background, it can run unattended as long as you want without tying up your terminal. Moreover, because tail is running in the same window or virtual console in which you are working, you will instantly see any new output. The disadvantage is that the output of tail will be mixed in with the output of whatever other programs you run, which can be confusing.

Note: When you run a program in the background, you can't stop it by pressing ^C. Instead, you need to use the kill command. (The details are explained in Chapter 26.)

An alternative way to use tail -f is to run it in its own terminal window or virtual console (see Chapter 6). If you do this, once tail begins you can leave it alone, do your work in a second window or virtual console, and check back with tail whenever you want. Using two windows or consoles in this way not only allows you to run other commands while tail is running, but it also keeps the output of tail separate. The only drawback is that you must remember to keep an eye on the window or console where tail is running.

If you would like to practice using tail -f, here is an experiment to try. To start, open two terminal windows (see Chapter 6). In the first window, use the cat command to create a small file named example:

cat > example

Type 4-5 lines and then press ^D to end the input and stop the command. (Using cat to create a small file is explained in Chapter 16; using ^D, the eof key, is explained in Chapter 7.)

In the second terminal window, enter the following tail command:

tail -f example

The tail program will list the last 10 lines of example and then wait for new input. Now return to the first window and add some more lines to the file. The easiest way to do this is to append the data using >> (see Chapter 16). Enter the command:

cat >> example

Now type as many lines as you want, pressing <Return> at the end of each line. Notice that, each time you type a line in the first window, the line shows up in the second window as output from tail.

When you are finished experimenting, press ^D in the first window to tell cat there is no more input. Then press ^C in the second window to stop tail.

If you look back at the syntax for using tail -f, you will see that you can specify more than one file name. This is because tail can monitor multiple files at the same time and alert you when any one of them receives new data. If you would like to experiment, set up two terminal windows as in our last example. In the first window, use cat to create two small files as described above:

cat > file1
cat > file2

In the second window, run the following command:

tail -f file1 file2

Now return to the first window and use the following commands in turn to add lines to one of the two files:

cat >> file1
cat >> file2

Notice that each time you use cat (in the first window) to add lines to one of the files, tail (in the second window) shows you the name of the file followed by the new lines.

Jump to top of page

Binary, Octal and Hexadecimal

To conclude this chapter, we are going to talk about two commands, od and hexdump, that are used to display data from binary files. To interpret the output of these commands, you will need to understand the binary, octal and hexadecimal number systems. So, before we move on, let's take a moment to discuss these very important concepts.

Although these three number systems are very important to computer science and computer programming, a detailed discussion is, unfortunately, beyond the scope of this book. In this section, I'll cover the basic ideas to get you started. If you have your heart set on becoming a computer person, my advice is to spend some time on your own, studying these topics in detail.

Most of the time, we use numbers composed of the 10 digits, 0 through 9. As such, our everyday numbers are composed of powers of 10: 1, 10, 100, 1000, and so on. We call such numbers DECIMAL NUMBERS. For example, the decimal number 19,563 is actually:

(1x10,000) + (9x1,000) + (5x100) + (6x10) + (3x1)

Or, using exponents:

(1x104) + (9x103) + (5x102) + (6x101) + (3x100)

We refer to this system as BASE 10 or the DECIMAL SYSTEM. The name comes from the idea that all numbers are constructed from 10 different digits. In the world of computers, there are three other bases that are actually more important than base 10:

• Base 2 (binary): uses 2 digits, 0-1
• Base 8 (octal): uses 8 digits, 0-7
• Base 16 (hexadecimal): uses 16 digits, 0-9 A-F

The importance of these systems comes from the way computer data is stored. This is because all data is organized into sequences of electrical traces that, conceptually, are considered to be either off or on. This is true for any type of data, whether it resides in computer memory (such as RAM or ROM) or stored on disks, CDs, DVDs, flash memory, or other devices.

As a shorthand notation, we represent "off" by the number 0, and "on" by the number 1. In this way, any data item, no matter how long or short, can be considered to be a pattern of 0s and 1s. Indeed, in a technical sense, this is how computer scientists think of data: as long streams of 0s and 1s.

Here are a few simple examples. Within the ASCII code, the letter "m" is represented by the pattern:

01101101

The word "mellow" is represented as:

011011010110010101101100011011000110111101110111

(For a discussion of the ASCII code, see Chapter 19 and 20. For a table showing the details of the code, see Appendix D.)

The ASCII code is used only to represent individual characters. When it comes to working with numerical values, we use a variety of different systems. Without going into the details, here is how the number 3.14159 would be represented using a system called "single precision floating-point":

01000000010010010000111111010000

If these examples seem a bit confusing, don't worry. The details are very complex and not important right now. What is important is that you should understand that, to a computer scientist, all data — no matter what type or how much — is stored as long sequences of 0s and 1s. For this reason, it is important to learn how to work with numbers consisting only of 0s and 1s, which we call BINARY NUMBERS.

In computer science, a single 0 or 1 that is stored as data is called a a BIT ("binary digit"); 8 bits in a row is called a BYTE. For instance, the previous example contains a binary number consisting of 32 bits or 4 bytes of data. We refer to this system as BASE 2 or the BINARY SYSTEM. The name reminds us that, in base 2, all numbers are constructed from only two different digits: 0 and 1.

If you are a beginner, the advantages of base 2 will not be obvious, and binary numbers will look meaningless. However, once you get some experience, you will see that binary numbers directly reflect the underlying reality of how data is stored. This is why, in many cases, using binary numbers offers a significant advantage over using base 10 numbers. However, there is a problem: binary numbers are difficult to use because they take up a lot of room, and because they are unwieldy and confusing to the eye.

As a compromise, there are two ways to represent binary numbers in a more compact fashion without losing the direct connection to the underlying data. They are called "base 8" and "base 16". Before I can explain how they work, I want to take a quick diversion to show you how to count in base 2.

In base 10, we start counting from 0 until we run out of digits. We then add 1 to the digit on the left and start again at 0. For example, we start with 0 and count 1, 2, 3, 4, 5, 6, 7, 8, 9, at which point we run out of digits. The next number is 10. We continue 11, 12, 13, 14, 15, 16, 17, 18, 19, and then go to 20. And so on.

Base 2 (in fact, all bases) work the same way. The only difference is in how far we can go before we run out of digits. In base 2, we have only 2 digits: 0 and 1. We start counting at 0, and then move to 1, at which point we run out of digits. So the next number is 10. Then 11, 100, 101, 110, 111, 1000 and so on. In other words:

0 (base 10) = 0 (base 2)
1 (base 10) = 1 (base 2)
2 (base 10) = 10 (base 2)
3 (base 10) = 11 (base 2)
4 (base 10) = 100 (base 2)
5 (base 10) = 101 (base 2), and so on.

Take a look at Figure 21-5 where you will see the decimal numbers 0 through 20, along with their base 2 equivalents. (For now, you can ignore the other two columns.)

Figure 21-5: Decimal, Binary, Octal and Hexadecimal Equivalents

In regular life, we use decimal (base 10) numbers. With computers, data is stored in a form that is best reflected using binary (base 2) numbers. Such data can be written in a more compact form by using either octal (base 8) or hexadecimal (base 16) numbers. (See text for details.)

This table shows the equivalent ways of representing the decimal numbers 0 through 20 in binary, octal and hexadecimal. Can you see the patterns? Do they make sense to you?

Decimal
(Base 10)
Binary
(Base 2)
Octal
(Base 8)
Hexadecimal
(Base 16)
0000
1111
21022
31133
410044
510155
611066
711177
81000108
91001119
10101012A
11101113B
12110014C
13110115D
14111016E
15111117F
16100002010
17100012111
18100102212
19100112313
20101002414

BASE 8 is also called OCTAL. With this base, we have 8 digits, 0 through 7, so we count as follows: 0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, and so on.

BASE 16, also called HEXADECIMAL or HEX, works in a similar fashion using 16 digits. Of course, if we confine ourselves to regular digits, we only have 10 of them: 0 through 9. To count in base 16, we need 6 more digits, so we use the symbols A, B, C, D, E and F. Thus, in base 16, we count as follows: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, and so on.

Take another look at Figure 21-5. By now, all four columns should be starting to make sense. Of course, this may all be new to you, and I don't expect you to feel comfortable with three new ways of counting right away. However, the time will come when it will all seem easy. For example, an experienced programmer can look at the binary number "1101" and instantly think: 13 decimal. Or he can look at the base 8 number "20" and instantly think: 16 decimal. Or he can look at the base 10 number "13" and instantly think: D in hexadecimal. One day it will be just as easy for you; it's not really that hard once you practice.(*)

* Footnote

When I was an undergraduate student at the University of Waterloo (Canada), I worked as a systems programmer for the university computing center. This was in the days of IBM mainframe computers when it was especially important for system programmers to be comfortable with hexadecimal arithmetic. Most of us could add in base 16, and a few people could subtract. For more complicated calculations, of course, we used calculators. My supervisor, however, was an amazing fellow. His name was Romney White, and he could actually multiply in base 16. Romney was the only person I ever met who was able to do this.

Today, by the way, Romney works at IBM, where he is an expert in using Linux on mainframes. When you have a moment, look him up on the Web. (Search for: "romney white" + "linux".)

So why is all this so important? The answer lies in Figure 21-6. Suppose you ask the question, how many different 3-bit binary numbers are there? The answer is 8, from 000 to 111 (when you include leading zeros). In Figure 21-6, you can see that each of these 8 values corresponds to a specific octal number. For example, 000 (binary) equals 0 (octal); 101 (binary) equals 5 (octal); and so on. This means that any pattern of 3 bits (binary digits) can be represented by a single octal digit. Conversely, any octal digit corresponds to a specific pattern of 3 bits.

Figure 21-6: Octal and Binary Equivalents

Every combination of three base 2 digits (bits) can be represented by a single octal digit. Similarly, each octal digit corresponds to a specific pattern of 3 bits.

Octal
(Base 8)
Binary
(Base 2)
0000
1001
2010
3011
4100
5101
6110
7111

This is an extremely important concept, so let's take a moment to make sure you understand it. As an example, consider the binary representation of "mellow" we looked at earlier:

011011010110010101101100011011000110111101110111

Here we have 48 bits. Let's divide them into groups of three:

011 011 010 110 010 101 101 100
011 011 000 110 111 101 110 111

Using the table in Figure 21-6, we can replace each set of 3 bits with its equivalent octal number. That is, we replace 011 with 3, 010 with 2, and so on:

3 3 2 6 2 5 5 4 3 3 0 6 7 5 6 7

Removing the spaces, we have:

3326255433067567

Notice how much more compact octal is than binary. However, because each octal digit corresponds to exactly 3 bits (binary digits), we have retained all the information exactly. Why does this work out so nicely? It's because 8 is an exact power of 3. In particular, 8 = 23. Thus, each digit in base 8 corresponds to 3 digits in base 2.

At this point, you may be wondering, could we represent long strings of bits even more compactly, if we use a number system based on a higher power of 2? The answer is yes. The value of 24 is 16, and we can do even better than base 8 by using base 16 (hexadecimal). This is because each hexadecimal digit can represent 4 bits. You can see this in Figure 21-7.

Figure 21-7: Hexadecimal and Binary Equivalents

Every combination of four base 2 digits (bits) can be represented by a single hexadecimal digit. Similarly, each hexadecimal digit corresponds to a specific pattern of 4 bits.

Hexadecimal
(Base 16)
Binary
(Base 2)
00000
10001
20010
30011
40100
50101
60110
70111
81000
91001
A1010
B1011
C1100
D1101
E1110
F1111

To see how this works, consider once again, the 48-bit binary representation of "mellow":

011011010110010101101100011011000110111101110111

To start, let's divide the bits into groups of four:

0110 1101 0110 0101 0110 1100 0110 1100 0110 1111 0111 0111

Using the table in Figure 21-7, we replace each group of 4 bits by its hexadecimal equivalent:

6 D 6 5 6 C 6 C 6 F 7 7

Removing the spaces, we have:

6D656C6C6F77

Thus, the following three values are all equivalent. The first in binary (base 2), the second in octal (base 8), and the third in hexadecimal (base 16):

011011010110010101101100011011000110111101110111
3326255433067567
6D656C6C6F77

What does this all mean? Because we can represent binary data using either octal characters (3 bits per character) or hexadecimal characters (4 bits per character), we can display the raw contents of any binary file by representing the data as long sequences of either octal or hex numbers. And that, in fact, is what I am going to show you how to do when we discuss the hexdump and od commands. Before we do, however, we need to discuss just one more topic.

Jump to top of page

Reading and Writing
Binary, Octal and Hexadecimal

What do you think of when you see the number 101? Most people would say "one hundred and one". However, as a computer person, you might wonder: how do I know I am looking at a base 10 number? Perhaps "101" refers to a base 2 (binary) number, in which case its decimal value would be 5:

(1x22) + (0x21) + (1x20) = 4 + 0 + 1 = 5

Or perhaps it's base 8 (octal), giving it a value of 65 decimal:

(1x82) + (0x81) + (1x80) = 64 + 0 + 1 = 65

Or could it be base 16 (hexadecimal), with a value of 257 decimal?

(1x162) + (0x161) + (1x160) = 256 + 0 + 1 = 257

You can see the problem. Moreover, if you want to speak about the number 101, how would you pronounce it? If you knew it was decimal, you would talk about it in the regular way. But what if it is binary or octal or hex? It doesn't make sense to call it as "one hundred and one".

Within mathematics, we use subscripts to indicate bases. For example, 10116 means "101 base 16"; 1018 means "101 base 8"; 1012 means "101 base 2". When you don't see a subscript, you know you are looking at a base 10 number.

With computers, we don't have subscripts. Instead, the most common convention is to use a prefix consisting of the digit 0 followed by a letter to indicate the base. The prefix 0x means "base 16" (hex); 0o means "base 8" (octal); 0b means "base 2" (binary). For example, you might see 0x101, 0o101 or 0b101. Sometimes, we indicate octal in a different way, by using a single (otherwise unnecessary) leading 0, such as 0101. You can see these conventions illustrated in Figure 21-8.

Figure 21-8: Conventions for Indicating Hexadecimal, Octal, and Binary Numbers

When we work with non-decimal number systems, we need written and spoken conventions to indicate the base of a number. In mathematics, we use subscripts when writing such numbers. With computers we don't have subscripts, so we usually use a special prefix. When we talk about non-decimal numbers, we pronounce each digit separately.

Meaning Mathematics Computers Pronunciation
101 base 10101101"one hundred and one"
101 base 16101160x101"hex one-zero-one"
101 base 810180101 or 0o101"octal one-zero-one"
101 base 210120b101"binary one-zero-one"

I realize that, at first, all this may be a bit confusing. However, most of the time the type of number being used is clear from context. Indeed, you can often guess a base just by looking at a number. For example, if you see 110101011010, it's a good bet you are looking at a binary number. If you see a number like 45A6FC0, you know you are looking at a hex number, because only hexadecimal uses the digits A-F.

With hex numbers, you will see both upper- and lowercase letters used as digits. For example, the two numbers 0x45A6FC0, 0x45a6fc0 are the same. Most people, however, prefer to use uppercase digits, as they are easier to read.

When we speak about numbers, the rules are simple. When we refer to decimal numbers, we speak about them in the usual way. For example, 101 base 10 is referred to as "one hundred and one"; 3,056 is "three thousand and fifty-six".

With other bases, we simply say the names of the digits. Sometimes we mention the base. For example, if we want to talk about 101 base 16, we say "one-zero-one base 16" or "hex-one-zero-one"; 3,056 base 16 is "three-zero-five-six base 16" or "hex three-zero-five-six".

Similarly, 101 base 8 is "one-zero-one base 8" or "octal one-zero-one"; and 101 base 2 is "one-zero-one base 2", "binary one-zero-one", or something similar. You can see these pronunciations illustrated in Figure 21-8.

Jump to top of page

Why We Use Hexadecimal
Rather Than Octal

In Chapter 7, we talked about the type of technology that was used to create computer memory in the olden days. In particular, I mentioned that, in the 1950s and into the 1960s, memory was made from tiny magnetic cores. For this reason, the word CORE became a synonym for memory.

In those days, debugging a program was difficult, especially when the program aborted unexpectedly. To help with such problems, a programmer could instruct the operating system to print the contents of the memory used by a program at the moment it aborted. The programmer could then study the printed data and try to figure out what happened. As I explained in Chapter 7, such data was called a CORE DUMP, and it took a lot of skill to interpret. Today, the expression "core dump" is still used, but you will sometimes see the term MEMORY DUMP or DUMP used instead.

In the early 1970s when Unix was developed, debugging could be very difficult, and programmers often had to save and examine dumps. Although technology had evolved — magnetic cores had been replaced by semiconductors — memory was still referred to as core, and a copy of the contents of memory was still called a core dump. Thus, when Unix saved the contents of memory to a file for later examination, the file was called a CORE FILE, and the default name for such a file was core.

Thus, from the very beginning, Unix programmers had a need to examine core files. To meet this need, the Unix developers created a program to display the contents of a core file as octal (base 8) digits. This program was named od ("octal dump") and, over the years, it has proven to be an especially useful program. Even today, using od is one of the best ways we have to look at binary data.

As we discussed in the previous section, binary data can be represented as either octal numbers, using 3 bits per digit, or hexadecimal numbers, using 4 bits per digit. Octal is relatively easy to learn because it uses digits that are already familiar to us (0 through 7). Hexadecimal, on the other hand, requires 16 digits, 6 of which (A, B, C, D, E, F) are not part of our everyday culture. As such, hex is a lot more difficult to learn than octal. Nevertheless, hexadecimal is used much more than octal. There are three reasons for this.

First, hex is significantly more compact than octal. (To be precise, hex is 4/3 times more compact than octal.) If you want to display bits, it takes a lot fewer hex characters to do the job than octal characters.

The second reason hex is more popular has to do with how bits are used. Computer processors organize bits into fundamental units called WORDS, the size of a word depending on the design of the processor. Since the mid-1960s, most processors have used 16-bit or 32-bit words; today, it is common to find processors that use 64-bit words. In the 1950s and 1960s, however, many computers, especially scientific computers, used 24-bit or 36-bit words.

With a 24-bit or 36-bit word, it is possible to use either octal or hex, because 24 and 36 are divisible by both 3 and 4. Since octal was simpler, it was used widely in the 1950s and 1960s.

With 16-bit, 32-bit or 64-bit words, it is difficult to use octal, because 16, 32 and 64 are not divisible by 3. It is, however, possible to use hexadecimal, because 16, 32 and 64 are all divisible by 4. For this reason, since the 1970s, hex has been used more and more, and octal has been used less and less.

The third reason why hexadecimal is used so widely is that, although it is harder to learn than octal, once you learn it, it is easy to use. For this reason, even the earliest versions of od came with an option to display data in hexadecimal.

As I mentioned earlier, the venerable od program has been around for years, in fact, since the beginning of Unix. However, in 1992, another such program, called hexdump, was written for BSD (Berkeley Unix; see Chapter 2). Today, hexdump is widely available, not only on BSD systems, such as FreeBSD, but as part of many Linux distributions.

Most experienced Unix people tend to pick a favorite, either od or hexdump, and use one or the other. For this reason, I am going to show you how to use both of them, so you can see which one you like best.

Jump to top of page

Displaying Binary Files: hexdump, od

The original use for both hexdump and od (octal dump) was to look at memory dumps contained in core files. By examining a dump, a programmer could track down bugs that would otherwise be elusive. Today, there are much better debugging tools, and programmers rarely look at core files manually. However, hexdump and od are still useful, as they can display any type of binary data in a readable format. Indeed, these two programs are the primary text-based tools used to look inside binary files.

Since either of these programs will do the job, I'll show you how to use both of them. You can then do some experimenting and see which one you prefer. The biggest difference between the two programs is that hexdump, by default, displays data in hexadecimal, while od, which is older, defaults to octal. Thus, if you use od, you will have to remember the specific options that generate hex output.

Another consideration is that od is available on all Unix systems, while hexdump is not. For example, if you use Solaris, you may not have hexdump. For this reason, if you work with binary files and you prefer hexdump, you should still know a bit about od, in case you have to use it one day.

Before we get into the syntax, let's take a look at some typical output. In Figure 21-9, you see a portion of the binary data in the file that holds the ls program. (The ls program is used to list file names. We will talk about it in Chapter 24.)

Figure 21-9: Sample binary data displayed as hexadecimal and ASCII

You can use the hexdump or od commands to display binary data. Here is a sample of such data displayed in canonical format; that is, with the offset in hexadecimal on the left, the data in hex in the middle, and the same data as ASCII characters on the right. This particular example was taken from the file that contains the GNU/Linux ls program.

 Offset Hexadecimal ASCII
 000120  00 00 00 00 00 00 00 00  00 00 00 00 06 00 00 00  |................|
 000130  04 00 00 00 2f 6c 69 62  2f 6c 64 2d 6c 69 6e 75  |..../lib/ld-linu|
 000140  78 2e 73 6f 2e 32 00 00  04 00 00 00 10 00 00 00  |x.so.2..........|
 000150  01 00 00 00 47 4e 55 00  00 00 00 00 02 00 00 00  |....GNU.........|
 000160  06 00 00 00 09 00 00 00  61 00 00 00 76 00 00 00  |........a...v...|
 000170  00 00 00 00 4c 00 00 00  4b 00 00 00 3f 00 00 00  |....L...K...?...|
 

When you examine data within a file, there will be times when you need to know the exact location of what you are looking at. When you use less to look at a text file, it's easy to figure out where you are. At any time, you can use the = (equals sign) command to display the current line number. Alternatively, you can use the -M option to show the current line number in the prompt, or you can use the -N option to display a number to the left of every line.

With a binary file, there are no lines, so line numbers are not meaningful. Instead, we mark each location within the file by an OFFSET, a number that tells you how many bytes you are from the beginning of the file. The first byte has offset 0; the second byte has offset 1; and so on.

Take a look at the sample data in Figure 21-9. The offset — which is not part of the data — is in the left-hand column. In our example, all the numbers are in hexadecimal, so the offset of the first byte of data is 0x120 (that is, hex 120) or 288 in decimal. Thus, the first byte of data in our example is the 289th byte in the file. (Remember, offsets start at 0.)

The first row of output contains 16 bytes. Thus, the offsets run from 0x120 to 0x12F. The second row starts at offset 0x130. To the right of each offset the 16 bytes per line are displayed in two different formats. In the middle column are hex digits, grouped in bytes. (Remember, one byte = 8 bits = 2 hex digits.) On the right, the same data is displayed as ASCII characters.

Within most binary files, you will notice that some bytes contain actual ASCII characters. It is easy to identify these bytes by looking at the rightmost column. In our example, you can see the strings /lib/ld-linux.so.2 and GNU. By convention, bytes that do not correspond to printable ASCII characters are indicated by a . (period) character. You can see many such bytes in our example.

Most bytes in a binary file are not characters; they are machine instructions, numeric data, and so on. You can tell this by looking in the rightmost column, where you will see mostly . markers, with a sprinkling of random characters. In our example, the first line and last two lines contain all non-ASCII data. A few bytes do contain values that happen to correspond to characters, but this is coincidental and not meaningful.

The way in which data is displayed in Figure 21-9 is called CANONICAL FORMAT. This format, used for binary data that is displayed or printed, consists of 16 bytes per line. To the left of each line is the offset in hexadecimal. In the middle are the actual bytes, also in hexadecimal. On the right are the ASCII equivalents.

Both hexhdump and od are able to display binary in many different formats. In fact, both commands support a large variety of options that give you enormous control over how data is displayed. Most of the time, however, it is best to use the canonical format. For that reason, in our discussion of these commands, I will show you which options to use to produce this type of output. If you need information about the other variations, you can find it on the respective man pages.

We'll start with hexdump because it is simpler to use. To use hexdump to display a binary file in canonical format, the syntax is simple:

hexdump -C [file...]

where file is the name of a file.

The hexdump program has many options that allow you to control output. However, there is an important shortcut: if you use the -C (canonical) option, hexdump will automatically use the appropriate combination of options so as to produce canonical output.

Here is an example. Let's say you want to look inside the binary file that contains the ls program. To start, you use the whereis program to find the pathname — that is, the exact location — of the file. (We'll discuss pathnames and whereis in Chapter 24, so don't worry about the details for now.) The command to use is:

whereis ls

Typical output would be:

ls: /bin/ls /usr/share/man/man1/ls.1.gz

The output shows us the exact locations of the program and its man page. We are only interested in the program, so we use the first pathname:

hexdump -C /bin/ls | less

This command displays the contents of the entire file in canonical format. That's all there is to it.

If you want to limit the amount of data being displayed, there are two more options you can use. The -s (skip over) option allows you to set the initial offset by specifying how many bytes to skip at the beginning of the file. For example, to display data starting from offset 0x120 (hex 120) use:

hexdump -C -s 0x120 /bin/ls | less

To limit the amount of output, you use the -n (number of bytes) option. The following command starts at offset 0x120 and displays 96 bytes of data (that is, 6 lines of output). In this case, the amount of output is so small, we don't need to pipe it to less:

hexdump -C -s 0x120 -n 96 /bin/ls

This, by the way, is the exact command that generated the output for Figure 21-9.

Incorporating these two options into the syntax, we can define a more comprehensive specification for hexdump:

hexdump -C [-s offset] [-n length] [file...]

where file is the name of a file, offset is number of bytes to skip over at the beginning of the file, and length is the number of bytes to display. Note: offset can be in any base, but length must be a decimal number. (No one knows why; make up your own reason.)

— hint —

With FreeBSD, you can use the command hd as an alias for hexdump -C. Thus, on a FreeBSD system, the following two commands are equivalent:

hd /bin/ls
hexdump -C /bin/ls

If you would like to use this handy command on a different system, all you need to do is create an alias of your own, using one of the following commands. The first one is for the Bourne Shell family; the second is for the C-Shell family:

alias hd='hexdump -C'
alias hd 'hexdump -C'

To make the alias permanent, put the appropriate command into your environment file. (Aliases are discussed in Chapter 13; the environment file is discussed in Chapter 14.)

To use od to display a binary file in canonical format, the syntax is:

od -Ax -tx1z [file...]

where file is the name of a file.

The -A (address) option allows you to specify which number system to use for the offset values. For canonical output, you specify x, which displays the offsets in hexadecimal.

The -t (type of format) option controls how the data is to be displayed. For canonical output, you specify x1, which displays the data in hex one byte at a time, and z, which displays ASCII equivalents at the end of each line. For a full list of format codes, see the man page (man od) or the info file (info od).

(Note: This syntax is for the GNU version of od, such as you will find with Linux. If you are using a system that does not have the GNU utilities, the command will be more primitive. In particular, you won't be able to use the z format code. Check your man page for details.)

As an example, the following od command is equivalent to our original hexdump command. It displays the contents of the ls file in canonical format:

od -Ax -tx1z /bin/ls | less

If you want to limit the amount of data being displayed, there are two more options you can use. The -j (jump over) option specifies how many bytes to skip at the beginning of the file. For example, to start displaying data from offset 0x120 (hex 120) use:

od -Ax -tx1z -j 0x120 /bin/ls | less

To limit the amount of output, use the -N (number of bytes) option. The following command starts at offset 0x120 and displays 96 bytes (6 lines of output). In this case, the amount of output is so small, we don't need to pipe it to less:

od -Ax -tx1z -j 0x120 -N 96 /bin/ls

This command generates output similar to what you see in Figure 21-9.

Incorporating these two options into the syntax, we can define a more comprehensive syntax for od:

od -Ax -tx1z [-j offset] [-N length] [file...]

where file is the name of a file, offset is number of bytes to skip over at the beginning of the file, and length is the number of bytes to display, in decimal, hex or octal.

— hint —

The syntax for od is complex and awkward. However, you can simplify by creating an alias to specify the options that produce output in canonical format. The following commands will do the job. The first command is for the Bourne Shell family; the second is for the C-Shell family:

alias od='od -Ax -tx1z'
alias od 'od -Ax -tx1z'

Once you have such an alias, whenever you type od, you will automatically get the output you want. To make the alias permanent, put one of these commands into your environment file. (Aliases are discussed in Chapter 13; the environment file is discussed in Chapter 14.)

What's in a Name?

Canonical


Earlier in the chapter, when we discussed how interactive text-based programs handle input, I talked about canonical mode and non-canonical mode. In this section, I mentioned that a certain format for binary output is called canonical output. Computer scientists use "canonical" differently from the regular English meaning, so you should understand the distinction.

In general English, the word "canonical" is related to the idea of a canon, a collection of official rules governing the members of a Christian church. Canonical describes something that follows the rules of the canon. Thus, one might refer to the canonical practices of the Catholic Church.

In mathematics, the same term has a more exact and streamlined meaning. It refers to the simplest, most important way of expressing a mathematical idea. For example, high school students are taught the canonical formula for finding the roots of a quadratic equation.

Computer scientists borrowed the term from mathematics and, in doing so, they relaxed the meaning significantly. In computer science, CANONICAL refers to the most common, conventional way of doing something. For example, in our discussion of the od and hexdump commands, we talked about the canonical format for displaying binary data. There is nothing magical about this format. However, it works well, it has been used for well over four decades, and it is what people expect, so it is canonical.

Jump to top of page

Why Does So Much Computer
Terminology Come From Mathematics?

As you learn more and more computer science, you will notice that much of the terminology is derived from mathematics. As an example, the word "canonical" — which we used in two different ways in this chapter — comes from a similar mathematical term. You might be wondering why so much computer terminology comes from mathematics. There are several reasons.

Early computer science was developed in the 1950s and 1960s, based on theoretical work done by mathematicians in the 1930s and 1940s. In particular, the mathematical foundations for computer science were provided by the work of Alan Turing (1912-1954), John von Neumann (1903-1957), Alonzo Church (1903-1995) and, to a lesser extent, Kurt Gödel (1906-1978).

During the 1950s and 1960s, almost all computer scientists were mathematicians. Indeed, computer science was considered to be a branch of mathematics(*). It was only natural, then, for pioneers to draw upon terminology from their own fields to describe new ideas.

* Footnote

At the school where I did my undergraduate work — the University of Waterloo, Canada — the Department of Computer Science was (and still is) part of the Faculty of Mathematics. Indeed, my undergraduate degree is actually a Bachelor of Mathematics with a major in Computer Science.

Over the years, as computer science developed, it required a great deal of analysis and formalization. As with other sciences, the necessary techniques and insight were taken from mathematics which, having been studied and formalized for over 2,000 years, was rich in such tools. (This is why the great German mathematician and scientist Carl Friedrich Gauss referred to mathematics as "The Queen of Sciences".) Even today, computer scientists and programmers in need of abstraction and logical reasoning borrow heavily from mathematics. As they do, it is common for them to modify mathematical terms to suit their own needs.

— hint —

Mathematics is to computer science as Greek and Latin are to English.

Jump to top of page



Exercises

Review Question #1:

Which programs do you use to display:

• A text file one screenful at a time
• An entire text file all at once
• The first part of a text file
• The last part of a text file
• A binary file

Review Question #2:

As you are using less to display a file, which commands do you use to perform the following actions?

• Go forward one screenful
• Go backward one screenful
• Go to the first line
• Go to the last line
• Search forward
• Search backward
• Display help
• Quit the program

Review Question #3:

You can use less to display more than one file, for example:

less file1 file2 file3 file4 file5

As you are reading, which commands do you use to perform the following actions?

• Change to the next file
• Change to the previous file
• Change to the first file
• Delete the current file from the list

Which commands do you use to search forward and search backward within all the files?

Review Question #4:

What command would you use to watch the end of a growing file?

Review Question #5:

When you display a binary file, what is canonical format? How do you display a file in canonical format using hexdump? Using od?

Applying Your Knowledge #1:

Check the value of your PAGER environment variable. If it is not set to less, do so now. Display the man page (Chapter 9) for the less program itself. Perform the following operations:

• Display help information
• Page down to the end of help, one screenful at a time
• Take a moment to read each screen
• Quit help

• Search forward for "help"
• Search again
• Search again
• Search backward

• Go to the end of the man page
• Go backward one screenful
• Go to line 100
• Display the current line number
• Go to the line 20 percent of the way through the man page
• Display the current line number
• Go to the beginning of the man page

• Quit

Applying Your Knowledge #2:

You want to use less to display the contents of the file list, which contains many lines of text. Along with the text, you also want to see line numbers. However, there are no line numbers within the text, and you do not want to change the original file in any way.

How would you do this using nl and less?

How would you do it using only less?

Is there any advantage to using nl?

Applying Your Knowledge #3:

Convert the following binary (base 2) number to octal (base 8), hexadecimal (base 16), and decimal (base 10):

1111101000100101

You must show your work.

Applying Your Knowledge #4:

Use the strings command (Chapter 19) to look for character strings within the binary file /bin/ls.

Then select one string and use hexdump or od to find the exact position of that string in the file.

For Further Thought #1:

You want to use less to search through five different files, looking for a particular sequence of words. The obvious solution is to use less as follows:

less file1 file2 file3 file4 file5

However, this requires you to keep track of and manipulate five different files. Instead you might first combine the files into one large file:

cat file1 file2 file3 file4 file5 | less

Will this make your job easier or harder? Why?

For Further Thought #2:

In the text, we discussed four numbers systems: decimal (base 10), binary (base 2), octal (base 8) and hexadecimal (base 16). In principle, any positive whole number can be used as a base.

Consider base 12, also called DUODECIMAL. In base 12 we use the digits 0 through 9. For the extra two digits, we use A and B. Thus, in base 12, we count as follows: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, 10, 11, 12, and so on. The number 22 in base 10 equals 1A in base 12.

What important advantages does base 12 have over base 10? Hint: How many factors does 12 have compared to 10? How does this simplify calculations?

Would our culture be better off if we used base 12 instead of base 10?

In spite of its advantages, we don't use base 12 with computers. Most of the time we use base 16, which is much more complicated. Why is this?

By the way, if you have ever read the Lord of the Rings books by J.R.R. Tolkein, you will be interested to know that the Elvish languages use a duodecimal number system.

Jump to top of page