A Short Guide to AWK

AWK is built to process column-oriented text data, such as tables. In which a file is considered to consist of N records (rows) by M fields (columns)

Basic

# awk < file ‘{print $2}’

# awk ‘{print $2}’ file

file = input file
print $2 = 2nd field of the line, awk has whitespace as the default delimiter

Delimiter

# awk -F: ‘{print $2}’ file

-F: = use ‘:’ as delimiter

# awk -F ‘[:;]’ ‘{print $2}’ file

-F ‘[:;]’ = use multiple delimiter, and parse using EITHER ‘:’ OR ‘;’

# awk -F ‘:;’ ‘{print $2}’ file

 -F ‘:;’ = use multiple delimiter, and parse using ':;’ as THE delimiter

# awk -F ‘:’+ ‘{print $2}’ file

 -F ‘:’+ = use multiple delimiter, to match any number ':'

Arithmetic

# echo 5 4 | awk '{print $1 + $2}'

output is '9', the result of ‘+’ (works as addition)

# echo 5 4 | awk '{print $1 $2}'

output is '54', to get string concatenation

echo 5 4 | awk '{print $1, $2}'

output is '5 4', to get value of 1st and 2nd field

Variables

# awk -F ‘<FS>’ ‘{print $2}’ file

<FS>, aka field separator variable, can consist of any single character or regular expression 
e.g. awk -F ‘:’ ‘{print $3}’ /etc/passwd

# awk -F ‘<FS>’ ‘BEGIN{OFS=“<OFS>”} {print $3, $4}’ file

<OFS>, aka output field separator variable, is the value inserted input separated output
e.g. awk -F ‘:’ ‘BEGIN{OFS=“|||”} {print $3, $4}’ /etc/passwd

# awk ‘BEGIN {RS=‘<RS>’} {print $1}’ file

<RS>, aka row separator variable, it works the same as FS, but vertically

# awk ‘BEGIN{ORS=“<OFS>”} {print $3, $4}’ file

<ORS>, aka output row separator variable, it works the similar way as OFS

# awk ‘{print NR}’ file

NR, aka number of records, is equivalent to line number
it is quite helpful when calculating average

# awk ‘{print NF}’ file

NF, aka number of fields, uses whitespace as delimiter and returns the field number
the value will change when delimiter is redefined with ‘-F’

Note: awk ‘{print $NF}’ file

this will print out the last field of the line instead of number of fields, the similar usage is $0, which prints out the line

# awk ‘{print FILENAME}’ file

it prints out the file name

# awk ‘{print FILENAME, FNR}’ file1 file2

FNR, aka number of records for each input file, will give number of records depends on file specified

Comments

Popular posts from this blog

How to: Add Watermark to PDFs Programmatically using iTextSharp

A practical guide to Scala Traits