Have you ever been doing something on the terminal and realise that most of the information thats being reported is useless to you? Ever thought, “Hey it would be really cool if it would only show me ‘just that bit’”?
Awk could be the answer. Awk is a DSL or Domain Specific Language aimed at text processing, which is used by many command line guru’s. Although awk has an extremely large amount of uses, I will only be detailing its print statement in use with a pipe.
The awk print statement is used, obviously, to print things, not that spectacular but the real magic comes with awk’s variables. When you pass a stream to awk it counts the number of columns, or parts of the stream separated by the input field separator, which by default is whitespace, it then assigns each section a variable $num. So in the following line
This is a line.
$1, $2, $3, $4 = This, is, a, line.
So you may be able to see how it is useful, but you may not be able to see how practical this is.
Lets do an example run using ls. If I run
ls -lh
in a terminal I get the following output:
drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo -rw-r--r-- 1 bod bod 149M 2009-06-22 07:50 archlinux-2009.02-ftp-i686.iso drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Bend It Like Beckham (2002) DVDRip XviD [desidhamal.com] drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Bride Wars[2009]DvDrip[Eng]-FXG drwxr-xr-x 2 bod bod 4.0K 2009-06-23 13:46 Desktop drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 downloads drwxr-xr-x 6 bod bod 4.0K 2009-06-22 07:46 Hes.Just.Not.That.Into.You.DVDRip.XviD-DiAMOND drwxr-xr-x 3 bod bod 4.0K 2009-06-22 07:46 Hitch[2005]DVDrip[Eng]AC3[5.1]-Atlas47 drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Ice Road Truckers.S 03e02.rookie.run -rw-r--r-- 1 bod bod 348M 2009-06-22 08:17 Ice.Road.Truckers.S03E03.Canadian.Invasion.DSR.XviD-KRS.avi drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 livecd-i686-installer-2008.0-r1 drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 lose a guy torrent -rw-r--r-- 1 bod bod 303 2009-05-29 19:03 not_found.html -rw-r--r-- 1 bod bod 303 2009-05-29 19:03 not_found.html.1 drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Ocean's.12[2004]DvDrip[Eng]-aXXo drwxr-xr-x 2 bod bod 4.0K 2009-06-23 08:46 perl drwxr-xr-x 2 bod bod 4.0K 2009-06-25 17:51 python drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:52 Relapse drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Spore-RELOADED drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 The.Accidental.Husband[2008]DvDrip-aXXo drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 The.Illusionist[2006]DvDrip[Eng]-aXXo -rw-r--r-- 1 bod bod 2.4K 2008-12-11 21:54 t_skariah.asc.gpg -rw-r--r-- 1 bod bod 3.0K 2009-06-23 13:59 tuto.txt -rw-r--r-- 1 bod bod 575 2009-06-23 08:45 tuto.txt~ drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Underworld Rise of the Lycans[2009]DvDrip[Eng]-FXG drwxr-xr-x 2 bod bod 4.0K 2009-06-22 07:46 Yes.Man.2008.DvDRip-FxM
Now lets say, I only wanted to know about the name and the permissions. I could use awk to only give me those sections. Note: The use of the ls flag –quoting-style=locale makes ls quote filenames meaning awk will print filenames with spaces as one column instead of splitting them.
ls -lh --quoting-style=locale | grep -v "total " | awk '{print $1 "\t" $8}'
That would output this:
drwxr-xr-x `Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo' -rw-r--r-- `archlinux-2009.02-ftp-i686.iso' drwxr-xr-x `Bend drwxr-xr-x `Bride drwxr-xr-x `Desktop' drwxr-xr-x `downloads' drwxr-xr-x `Hes.Just.Not.That.Into.You.DVDRip.XviD-DiAMOND' drwxr-xr-x `Hitch[2005]DVDrip[Eng]AC3[5.1]-Atlas47' drwxr-xr-x `Ice -rw-r--r-- `Ice.Road.Truckers.S03E03.Canadian.Invasion.DSR.XviD-KRS.avi' drwxr-xr-x `livecd-i686-installer-2008.0-r1' drwxr-xr-x `lose -rw-r--r-- `not_found.html' -rw-r--r-- `not_found.html.1' drwxr-xr-x `Ocean\'s.12[2004]DvDrip[Eng]-aXXo' drwxr-xr-x `perl' drwxr-xr-x `python' drwxr-xr-x `Relapse' drwxr-xr-x `Spore-RELOADED' drwxr-xr-x `The.Accidental.Husband[2008]DvDrip-aXXo' drwxr-xr-x `The.Illusionist[2006]DvDrip[Eng]-aXXo' -rw-r--r-- `t_skariah.asc.gpg' -rw-r--r-- `tuto.txt' -rw-r--r-- `tuto.txt~' drwxr-xr-x `Underworld drwxr-xr-x `Yes.Man.2008.DvDRip-FxM'
Pretty cool yeah? So, thats all well and good, but I want to add my own spin to the output, the following is a bit different but shows awk’s brilliance even more.
ls -lh --quoting-style=locale | grep -v "total " | awk '{print "Size is ", $5 "\tfor ", $8}'
This will print out the following:
Size is 4.0K for `Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo' Size is 149M for `archlinux-2009.02-ftp-i686.iso' Size is 4.0K for `Bend Size is 4.0K for `Bride Size is 4.0K for `Desktop' Size is 4.0K for `downloads' Size is 4.0K for `Hes.Just.Not.That.Into.You.DVDRip.XviD-DiAMOND' Size is 4.0K for `Hitch[2005]DVDrip[Eng]AC3[5.1]-Atlas47' Size is 4.0K for `Ice Size is 348M for `Ice.Road.Truckers.S03E03.Canadian.Invasion.DSR.XviD-KRS.avi' Size is 4.0K for `livecd-i686-installer-2008.0-r1' Size is 4.0K for `lose Size is 303 for `not_found.html' Size is 303 for `not_found.html.1' Size is 4.0K for `Ocean\'s.12[2004]DvDrip[Eng]-aXXo' Size is 4.0K for `perl' Size is 4.0K for `python' Size is 4.0K for `Relapse' Size is 4.0K for `Spore-RELOADED' Size is 4.0K for `The.Accidental.Husband[2008]DvDrip-aXXo' Size is 4.0K for `The.Illusionist[2006]DvDrip[Eng]-aXXo' Size is 2.4K for `t_skariah.asc.gpg' Size is 3.0K for `tuto.txt' Size is 575 for `tuto.txt~' Size is 4.0K for `Underworld Size is 4.0K for `Yes.Man.2008.DvDRip-FxM'
Now, lets pick that command apart. The first section:
ls -lh --quoting-style=locale
Gives us a listing of the current working directory in a human readable long listing style. Meaning it prints the file sizes with K/M/G/T on the end, long listing means it gives you loads of extra info, like permissions, group user ownership, timestamps etc.
The next section:
grep -v "total "
Means, before we pass the text to awk, get rid of any line which contains “total “, ls -lh would usually print “total ” as the first line, which would muck up our awk command.
The final section:
awk '{print "Size is ", $5 "\tfor ", $8}'
Means, print the words “Size is” followed by column 5 then the word “for” then print column 8. The ‘\t’ is an escape character which means tab, it tells awk to insert a tab there. The various ‘,’ and whitespace in the quotes are just for presentation, you’ll have to experiment with them yourselves.
In case you didn’t know, the pipes
|
send the output of the command preceding the pipe, to the input of the command succeeding the pipe.
Hopefully now you can see the power of awk. This is by no means whatsoever a complete guide, It barely covers the basics. If you wnt to find out more, check out chapter 6 of the bash beginners guide.
Thanks for reading.
PS: Angus,Thongs.And.Perfect.Snogging[2008]DvDrip-aXXo is not porn, its a film my girlfriend bought