Friday, August 24, 2001

Split files

Some times you need to split a big file into a set of smaller ones, for example to send them by email. If the file is a "text" one and not too huge you can always try with an editor. But if the file contains non-standard characters (I mean, the one used in daily life) or it is too big, an editor is not a good way of dividing the file. There is a UNIX command that does this job: split. Used simply as in the example below will divide the file into smaller ones, named in a pattern like xaa, xab, etc. Here is the example:


split file-name

There are options to split to control the way the file is divided:

  1. -b number-of-bytes will put at most that number of bytes on each of the smaller output files;
  2. -l number-of-lines is similar to the previous option, but with lines;
  3. -a number will use number of characters for the names of the output files, for example, if you do -a 3 your files will be called xaaa, xaab, etc;
  4. -d will use numbers for the names of the output files, like x00, x01, etc;
  5. –verbose will show you what output files are being written.

In some sense split is the opposite of cat

Saturday, August 18, 2001

Joining files

Suppose you have two files, say file1 and file2 which you want to put together in a single file, joint-file. There are two possible ways you might want to do that.
First, if you want file2 to be appended at the end of file1 you can execute this command:


cat file1 file2 > joint-file

On the other hand, if you want to join the files so that the lines of file2 are appended to the lines of file1, line by line, you can do


paste file1 file2 > joint-file

When might you need any of this options? Suppose each file has a list of users in your system and you want to put them together; then you can use cat. On the other hand, if your first file has the name of your users, and the second one the phone numbers (in the same order!), then you can use paste

An interesting option to paste is as follows:


paste -f file1 file2 > joint-file

The program will stop when one of the two files end; if they have the same number of lines, then -f will not make any difference, but if they are of different length (in terms of lines), you will get in the output file as many lines as the shortest file.

Thursday, August 09, 2001

Begining and end of a file

Some times you might want to see only the first or last few lines of a file. The commands head and tail

allows you to do precisely that.

To see the first 10 lines of a file do:


head file-name

If you want any other number of lines, say 15, do as in this example:


head -15 file-name

In case you prefer to see the first 30 bytes of a file you can do


head -c 30 file-name

You can give more than one file name in the command line.

You might wonder why you want to look at the beginning of a file. Let me give you an example: suppose you have several HTML files and you want to know if the first line (in each file) has the DOCTYPE information. Then you can look at the first line of each file with a command similar to this:


head -1 *.html

The command tail behaves in a similar way except that looks at the end of the file. So


tail file-name

will produce the last 10 lines of a file; called as


tail -15 file-name

will give the last 15 lines, while


tail -c 30 file-name

will give you the last 30 bytes of it.

There is one more useful option for tail; for example, if you do


tail +3 file-name

you will get all lines beginning at the 3rd line of the file.

An example of usage of tail is to check that all your HTML files end with . You can check the last line of each file with a command like this:


tail -1 *.html

Friday, August 03, 2001

Finding Duplicate Lines in file ...

uniq file-name

There are some options you can give to uniq:

  1. -c will count the number of repetitions of lines
  2. -d will do the "opposite" of the normal behaviour, namely will print out only the duplicate lines
  3. -i will ignore cases during comparison

You can use this command to get unique lines in the output of other commands as in the case of sort.

Wednesday, August 01, 2001

Word Count in a file

Another command related to the contents of files is wc. It is used to count the number of lines, words and characters in a file. If you use it like this example you will get the number of lines, words and bytes (characters in a "text" file) of a file


wc file-name

For getting the result of only one of the three possible outputs above you have to give some option:

  1. For lines output:

    wc -l file-name
  2. For words output:

    wc -w file-name
  3. For bytes (characters in a "text" file:

    wc -c file-name

    or

    wc -w file-name

Another interesing option allows you to get the length of the longest line in the file:


wc -L file-name

You can give more than one file name in your command line; the output will have the file name appended to the results of the counting.