Tuesday, September 25, 2001

File names with Special Characters

Some times, either by mistake, downloading something from the Internet or other reasons, you get files with non-standard characters in their names that can give you problems. Let me be more specific with a particular example: suppose you have a file called Important file, that is, the full name of the file is the words Important and file separated by a single space (I will also assume that this is the only file in the current directory, to avoid particular cases where the commands given below do not behave as I explain). If you want to see the file contents, say with less, the following command will not work:

less Important file

Instead of the contents of the file you will get an error from the system telling you that the files Important and file do not exist. This is because the space character is also used by the system to separate words in the command line. So, inthe above example, what the system understand is the command less applied to two files, called Important and file, which do not exist in your directory (remember I’m assuming that there is only one file in the currect directory, with that "funny" name).

How can you get around this problem? Well, what you need to do is to give the space character as part of the file name, and not as a separation character between file names. This is called escaping the space character. One possible way is by enclosing the file name in double quotation marks:

less "Important file"

You can do this with any other command, not only less. Another possible way is by putting a slash (\) before the space, so the Operating System (rather, the shell) knows that the space is understood as that. The command in this case will be like this:

less Important\ file

There are other characters in file names that give trouble, for example names starting with hyphens (-). This is because most options for commands in Linux are given by the hyphen, for example rm -v and things like that. For example, a file called -myfile cannot be removed with this command:

rm -myfile

You will get an error message saying invalid option – m to the command rm. What can you do? Here is another way out:

rm — -myfile

The two hyphens tell the system that what comes after them are not options any longer but rather the argument to the command rm, that is, the name of the file that you want to remove.

Tuesday, September 18, 2001

Comparing Files

Many times you have two versions of a file (eg. if you edited and saved it under different names, or one one version from your system and another version of the file from another machine) and you need to know whether they are equal or no, and where they differ. There are a couple of commands that can help you with that: cmp and diff. Let’s look at them.

The command cmp compares (thus the name) two files byte by byte and tells you whether they are equal or not. Use it like this:

cmp file-1 file-2

If the files are identical the output will be empty, meaning that no message is printed, nothing comes, you go back to your (shell) prompt. On the other hand, if the files are different, your output will be the first character where they differ, something like this:

file-1 file-2 differ: char5, line 1

Used without options cmp stops once it finds a different character (actually byte, you can use it on "binary" files) between the two files. You can give an option to cmp to get all differences, but the output might be a little more complicated to understand. Here is an example:

cmp -l file-1 file-2

The output in the files I was testing to write this is the following:

2 145 105
6 141 101

which tells me that bytes number 2 and 6 are different. This is not much of an use, but you can get a nicer output with this:

cmp -b -l file-1 file-2

The output in my case was the following:

2 145 e    105 E
6 141 a 101 A

That is much better: it says that byte 2 in the first file is a lower case e and an upper case E in the second file, and byte 6 in the first file is a lower case a while in the second file is an upper case A.

Another option to cmp is -n use like this:

cmp -n 10 file-1 file-2

which will compare at most 10 bytes from each file (true, not much of use if you are not used to bytes and all that, but you can think in terms of characters, 1 byte = 1 character).

The other function to compare (see differences) files is diff. The output of this command is more useful than the ouput of cmp (however, you might find the output a little strange, with information that you do not want/understand, but that is because it can be used in programming, in things like patches and stuff like that, which I will not deal with for the time being). So if we forget for the time being the output information we do not understand, we can use diff to check the differences between to files. Here is a basic example of how to use it:

diff file-1 file-2

The output will show the lines that are different between the two files. It prepends a <> sign to those in the second file, for example here is the output of my test files:

1c1
———
>tEst

If your files are long I recommend you use diff together with less to be able to read the whole output, in this manner:

diff file-1 file-2 | less

The command diff has many useful options; here are some:

  1. -i: to ignore cases; in the above example you will not get any difference, since one file contains the work test and the other the word tEst, which differ just in a upper/lower case.
  2. -E: will ignore differences due to TABS
  3. -b: will ignore differences in the number of spaces
  4. -w: will ignore all white spaces, including TABS
  5. -B: will not see differences coming from blank lines (that is, will compare only lines that have characters)
  6. -q: output only whether the file are different (so diff will behave like cmp)
  7. -y: will put the output in two columns, one for each file
  8. –suppress-common-lines: will remove common lines from output, showing only the lines that are different
  9. -l: will pass the output trhough the programm pr, similar to the example shown above with less (but a different "pager")
  10. -s: report when two files are identical (again, behaviour similar to cmp)

Your distribution might have another command called diff3 that can be used to compare three files. If you have the tk packages your distributions might have installed also tkdiff which is a very nice, graphical interface to diff (you have to use this in X-windows).

Wednesday, September 12, 2001

File Settings

Here are a couple of commands, tac and touch.

I have never used the first one, but according to the manual page, and the tests I have done, tac is the reverse of cat, as the name suggest. It is used like this:

tac file-name

It will print the file in the screen (I mean, the terminal from which you are working, known as standard output) with lines in reverse order, from the last line to the first.

The second command, touch is more useful. It changes the dates of a file. Here is an example:

touch file-name

This will change the access time and the modification time to the current date in the computer. If you want to change only the access time you should give the option -a; for the modification time change you have to give the option -m.

If the file does not exist, then the command will create an empty file with the corresponding times.

You can change to any time (within the limits of the Operating System) with a command like this:

touch -t 200502011323 file-name

The format of the time (the number 200502011323 above) is of this type: YYYYMMDDhhmm. Here is the explanation:

  1. YYYY: the year
  2. MM: the month
  3. DD: the day of the month
  4. hh: the hour
  5. mm: the minute

So the time given in the string 200502011323 is translated to 1st (DD=01) of February (MM=02) of 2005 (YYYY=2005), 1 PM (hh=13), 23 minutes (mm=23).

Why would you want to change the date of a file? Well, some times programs depend on the date (modification/access) of a file, and they will not run if the file is too new or too old. So using touch you can make the program work. It happens, for example, if your machine’s time gets messed, and the file has a modification time in the future. The program make uses a file called Makefile; if the time of that file is in the future, make will not work; using touch you can change the date of the Makefile and get to business.

Monday, September 03, 2001

Compressing files

Although nowadays computers come with lots of disk space, more than a regular user will ever need, some times it is good to be able to compress files to save disk space. That will give you more space for movies and other stuff :-) And many utilites are able to read and operate on compressed files, so it does not make a difference to daily usage. Below are a few ways of compressing files.

  1. gzip, the GNU "zip" facility: use it as tt. Creates a compressed file with the name file-name.gz (original file is removed). Uncompress with gunzip file-name.gz
  2. bzip or bzip2: the command is bzip2 file-name; the compressed file will be called file-name.bz or file-name.bz2 (original file is deleted). Uncompress with bunzip2 file-name.bz2
  3. tar: to archive lots of files, use as

    tar tar-file.tar file1 file2 file3 ….

    Will put the files file1 file2 file3 … in a single archive file called tar-file.tar, without removing the original files (although you should delete them since, after all, you are trying to save disk space). The files in the archive can be extracted with the following command:

    tar xkvf tar-file.tar
  4. tar and gzip: you can use gzip to compress a tar file, obtaining a file called something like tar-file.tar.gz. You can extract the individual files in the archive with this command:

    tar zxkvf tar-file.tar.gz
    zip: the utility quite used in Windows machines, similar to tar just mentioned. You create zipped files with

    zip file.zip file1 file2 file3 …

    and recover the original files (which, again, you should have deleted) with



    unzip file.zip

    tar -tvf filename will list the files under .tar

I will explain in a future posts a little more about tar and perhaps another archiving utility, ar, but now time to work on something else.