That’s Awksome!

Have you ever been doing something on the terminal and realise that most of the information thats being reported is useless to you? Ever thought, “Hey it would be really cool if it would only show me ‘just that bit’”?

Awk could be the answer. Awk is a DSL or Domain Specific Language aimed at text processing, which is used by many command line guru’s. Although awk has an extremely large amount of uses, I will only be detailing its print statement in use with a pipe.

The awk print statement is used, obviously, to print things, not that spectacular but the real magic comes with awk’s variables. When you pass a stream to awk it counts the number of columns, or parts of the stream separated by the input field separator, which by default is whitespace, it then assigns each section a variable $num. So in the following line

This is a line.

$1, $2, $3, $4 = This, is, a, line.

So you may be able to see how it is useful, but you may not be able to see how practical this is.

Lets do an example run using ls. If I run

ls -lh

in a terminal I get the following output:

drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo
-rw-r--r-- 1 bod bod 149M 2009-06-22 07:50 archlinux-2009.02-ftp-i686.iso
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Bend It Like Beckham (2002) DVDRip XviD [desidhamal.com]
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Bride Wars[2009]DvDrip[Eng]-FXG
drwxr-xr-x 2 bod bod 4.0K 2009-06-23 13:46 Desktop
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 downloads
drwxr-xr-x 6 bod bod 4.0K 2009-06-22 07:46 Hes.Just.Not.That.Into.You.DVDRip.XviD-DiAMOND
drwxr-xr-x 3 bod bod 4.0K 2009-06-22 07:46 Hitch[2005]DVDrip[Eng]AC3[5.1]-Atlas47
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Ice Road Truckers.S 03e02.rookie.run
-rw-r--r-- 1 bod bod 348M 2009-06-22 08:17 Ice.Road.Truckers.S03E03.Canadian.Invasion.DSR.XviD-KRS.avi
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 livecd-i686-installer-2008.0-r1
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 lose a guy torrent
-rw-r--r-- 1 bod bod  303 2009-05-29 19:03 not_found.html
-rw-r--r-- 1 bod bod  303 2009-05-29 19:03 not_found.html.1
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Ocean's.12[2004]DvDrip[Eng]-aXXo
drwxr-xr-x 2 bod bod 4.0K 2009-06-23 08:46 perl
drwxr-xr-x 2 bod bod 4.0K 2009-06-25 17:51 python
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:52 Relapse
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Spore-RELOADED
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 The.Accidental.Husband[2008]DvDrip-aXXo
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 The.Illusionist[2006]DvDrip[Eng]-aXXo
-rw-r--r-- 1 bod bod 2.4K 2008-12-11 21:54 t_skariah.asc.gpg
-rw-r--r-- 1 bod bod 3.0K 2009-06-23 13:59 tuto.txt
-rw-r--r-- 1 bod bod  575 2009-06-23 08:45 tuto.txt~
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Underworld Rise of the Lycans[2009]DvDrip[Eng]-FXG
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Yes.Man.2008.DvDRip-FxM

Now lets say, I only wanted to know about the name and the permissions. I could use awk to only give me those sections. Note: The use of the ls flag –quoting-style=locale makes ls quote filenames meaning awk will print filenames with spaces as one column instead of splitting them.

ls -lh --quoting-style=locale | grep -v "total " | awk '{print $1 "\t" $8}'

That would output this:

drwxr-xr-x	`Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo'
-rw-r--r--	`archlinux-2009.02-ftp-i686.iso'
drwxr-xr-x	`Bend
drwxr-xr-x	`Bride
drwxr-xr-x	`Desktop'
drwxr-xr-x	`downloads'
drwxr-xr-x	`Hes.Just.Not.That.Into.You.DVDRip.XviD-DiAMOND'
drwxr-xr-x	`Hitch[2005]DVDrip[Eng]AC3[5.1]-Atlas47'
drwxr-xr-x	`Ice
-rw-r--r--	`Ice.Road.Truckers.S03E03.Canadian.Invasion.DSR.XviD-KRS.avi'
drwxr-xr-x	`livecd-i686-installer-2008.0-r1'
drwxr-xr-x	`lose
-rw-r--r--	`not_found.html'
-rw-r--r--	`not_found.html.1'
drwxr-xr-x	`Ocean\'s.12[2004]DvDrip[Eng]-aXXo'
drwxr-xr-x	`perl'
drwxr-xr-x	`python'
drwxr-xr-x	`Relapse'
drwxr-xr-x	`Spore-RELOADED'
drwxr-xr-x	`The.Accidental.Husband[2008]DvDrip-aXXo'
drwxr-xr-x	`The.Illusionist[2006]DvDrip[Eng]-aXXo'
-rw-r--r--	`t_skariah.asc.gpg'
-rw-r--r--	`tuto.txt'
-rw-r--r--	`tuto.txt~'
drwxr-xr-x	`Underworld
drwxr-xr-x	`Yes.Man.2008.DvDRip-FxM'

Pretty cool yeah? So, thats all well and good, but I want to add my own spin to the output, the following is a bit different but shows awk’s brilliance even more.

ls -lh --quoting-style=locale | grep -v "total " | awk '{print "Size is ", $5 "\tfor ", $8}'

This will print out the following:

Size is  4.0K	for  `Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo'
Size is  149M	for  `archlinux-2009.02-ftp-i686.iso'
Size is  4.0K	for  `Bend
Size is  4.0K	for  `Bride
Size is  4.0K	for  `Desktop'
Size is  4.0K	for  `downloads'
Size is  4.0K	for  `Hes.Just.Not.That.Into.You.DVDRip.XviD-DiAMOND'
Size is  4.0K	for  `Hitch[2005]DVDrip[Eng]AC3[5.1]-Atlas47'
Size is  4.0K	for  `Ice
Size is  348M	for  `Ice.Road.Truckers.S03E03.Canadian.Invasion.DSR.XviD-KRS.avi'
Size is  4.0K	for  `livecd-i686-installer-2008.0-r1'
Size is  4.0K	for  `lose
Size is  303	for  `not_found.html'
Size is  303	for  `not_found.html.1'
Size is  4.0K	for  `Ocean\'s.12[2004]DvDrip[Eng]-aXXo'
Size is  4.0K	for  `perl'
Size is  4.0K	for  `python'
Size is  4.0K	for  `Relapse'
Size is  4.0K	for  `Spore-RELOADED'
Size is  4.0K	for  `The.Accidental.Husband[2008]DvDrip-aXXo'
Size is  4.0K	for  `The.Illusionist[2006]DvDrip[Eng]-aXXo'
Size is  2.4K	for  `t_skariah.asc.gpg'
Size is  3.0K	for  `tuto.txt'
Size is  575	for  `tuto.txt~'
Size is  4.0K	for  `Underworld
Size is  4.0K	for  `Yes.Man.2008.DvDRip-FxM'

Now, lets pick that command apart. The first section:

ls -lh --quoting-style=locale

Gives us a listing of the current working directory in a human readable long listing style. Meaning it prints the file sizes with K/M/G/T on the end, long listing means it gives you loads of extra info, like permissions, group user ownership, timestamps etc.

The next section:

grep -v "total "

Means, before we pass the text to awk, get rid of any line which contains “total “, ls -lh would usually print “total ” as the first line, which would muck up our awk command.

The final section:

awk '{print "Size is ", $5 "\tfor ", $8}'

Means, print the words “Size is” followed by column 5 then the word “for” then print column 8. The ‘\t’ is an escape character which means tab, it tells awk to insert a tab there. The various ‘,’ and whitespace in the quotes are just for presentation, you’ll have to experiment with them yourselves.

In case you didn’t know, the pipes

|

send the output of the command preceding the pipe, to the input of the command succeeding the pipe.

Hopefully now you can see the power of awk. This is by no means whatsoever a complete guide, It barely covers the basics. If you wnt to find out more, check out chapter 6 of the bash beginners guide.

Thanks for reading.

PS: Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo is not porn, its a film my girlfriend bought :)