Tuesday, September 18, 2001

Comparing Files

Many times you have two versions of a file (eg. if you edited and saved it under different names, or one one version from your system and another version of the file from another machine) and you need to know whether they are equal or no, and where they differ. There are a couple of commands that can help you with that: cmp and diff. Let’s look at them.

The command cmp compares (thus the name) two files byte by byte and tells you whether they are equal or not. Use it like this:

cmp file-1 file-2

If the files are identical the output will be empty, meaning that no message is printed, nothing comes, you go back to your (shell) prompt. On the other hand, if the files are different, your output will be the first character where they differ, something like this:

file-1 file-2 differ: char5, line 1

Used without options cmp stops once it finds a different character (actually byte, you can use it on "binary" files) between the two files. You can give an option to cmp to get all differences, but the output might be a little more complicated to understand. Here is an example:

cmp -l file-1 file-2

The output in the files I was testing to write this is the following:

2 145 105
6 141 101

which tells me that bytes number 2 and 6 are different. This is not much of an use, but you can get a nicer output with this:

cmp -b -l file-1 file-2

The output in my case was the following:

2 145 e    105 E
6 141 a 101 A

That is much better: it says that byte 2 in the first file is a lower case e and an upper case E in the second file, and byte 6 in the first file is a lower case a while in the second file is an upper case A.

Another option to cmp is -n use like this:

cmp -n 10 file-1 file-2

which will compare at most 10 bytes from each file (true, not much of use if you are not used to bytes and all that, but you can think in terms of characters, 1 byte = 1 character).

The other function to compare (see differences) files is diff. The output of this command is more useful than the ouput of cmp (however, you might find the output a little strange, with information that you do not want/understand, but that is because it can be used in programming, in things like patches and stuff like that, which I will not deal with for the time being). So if we forget for the time being the output information we do not understand, we can use diff to check the differences between to files. Here is a basic example of how to use it:

diff file-1 file-2

The output will show the lines that are different between the two files. It prepends a <> sign to those in the second file, for example here is the output of my test files:


If your files are long I recommend you use diff together with less to be able to read the whole output, in this manner:

diff file-1 file-2 | less

The command diff has many useful options; here are some:

  1. -i: to ignore cases; in the above example you will not get any difference, since one file contains the work test and the other the word tEst, which differ just in a upper/lower case.
  2. -E: will ignore differences due to TABS
  3. -b: will ignore differences in the number of spaces
  4. -w: will ignore all white spaces, including TABS
  5. -B: will not see differences coming from blank lines (that is, will compare only lines that have characters)
  6. -q: output only whether the file are different (so diff will behave like cmp)
  7. -y: will put the output in two columns, one for each file
  8. –suppress-common-lines: will remove common lines from output, showing only the lines that are different
  9. -l: will pass the output trhough the programm pr, similar to the example shown above with less (but a different "pager")
  10. -s: report when two files are identical (again, behaviour similar to cmp)

Your distribution might have another command called diff3 that can be used to compare three files. If you have the tk packages your distributions might have installed also tkdiff which is a very nice, graphical interface to diff (you have to use this in X-windows).

No comments: