The awk
tool is excellent for processing text files and operating on lines, or sets of lines.
Conditional Statements
Here, we print when one of the identified fields is missing
awk '{ if ($3 =="" || $4 == "" || $5 == "") print "Some field is missing";' }' ./myfile.txt
Compare numeric fields, spew one thing or the other
awk { if ($2 >=50 && $5 >= 100) print $0,"=>","Pass"; else print $0,"=>","Fail"; }' ./myfile.txt
Even More Maths as part of the conditions
awk '{ sum=$2+$5; avg=sum/2; if ( avg >= 90 ) res="Great Client"; else if ( avg >= 80) res="Good Client"; else if (avg >= 70) res="OK Client"; else grade="LMIM"; print $0 , " is " , res; }' ./myfile.txt
Awk supports a ternary operator too, using ?:
operators, here packing three lines into one.
awk 'ORS=NR%3 ? "," : "\n"' ./myfile.txt
Find Longest Lines
This snippet will process the file, find the longest lines & the line number; which pipes to numeric reverse sort and shows the top ten lines. Takes almost 0.4s to process an 80MiB file.
~ $ awk '{ print length, NR }' catalog.sql | sort -nr | head
This modification makes it show those lines sorted by line number (extra awk to switch, sort again)
awk '{ print length, NR }' catalog.sql | sort -nr | head | awk '{ print $2, $1 }' |sort -nr
Code Line Counting
find -type f -name '*.c' -name '*.js' -name '*.rb' -exec cat {} \; \ | awk '/^\s*#/ { hash_tick++; next; } \ /^\s*\/\// { line_tick++; next; } \ /^\s*$/ { void_tick++; next; } /\/\*/ { wide_flag=1; } \ /\*\// { wide_flag=0; } \ wide_flag { wide_tick++; next; } /./ { code_tick++; next; } END { print "Hash Comment: " hash_tick print "Line Comment: " line_tick print "Range Comment: " wide_tick print "Blank Lines: " void_tick print "Code Lines: " code_tick print "Total Lines: " NR }'
Which will spew something like:
Hash Comment: 4 Line Comment: 5122 Range Comment: 2798 Blank Lines: 11686 Code Lines: 82145 Total Lines: 101755
Number of Changes
Comparing two directories for lines added/removed
. # diff -ruw /code.old /code.new \ | awk ' /^\-/ { out++ }; /^\+/ { new++ }; END { print "Removed: " out; print "Added: " new; }'
Outputs something like:
diff: /code.old/http/scripts/states.js: No such file or directory diff: /code.old/lib/barcode.php: No such file or directory diff: /code.old/lib/inline.php: No such file or directory Removed: 26131 Added: 19144