Awk as tool and scripting language
|
Interpreting Log Files
Many larger network printers and print servers log print jobs in a text file. Logs usually include the print job originators, page sizes and counts, and other free format data fields for such things as project cost centers and other information. Each row is a print job record. In Listing 2, you can see the table headers and some sample records from a printlog.txt file.
Listing 2
Sample printlog.txt File
Document User Device Format Medium col b/w costctr C2.sxw LAGO pr04 DIN_A4 Normal 1 10 P01 prop.pdf LEHM pr03 DIN_A4 Normal 0 10 P01 offer.doc LOHN pr01 DIN_A4 Normal 3 0 P02 ...
The Awk scripts for processing these kinds of files take up about six to eight lines, depending on the format. They are included in the examples you can download from the Ubuntu User site. You can also copy Listing 3 into a text editor and save it as eval1.awk . You can start by evaluating the total number of black-and-white or color printed pages. Each of the sums is handled by a variable.
Listing 3
Print log evaluation 1
01 #Evaluating the number of printed pages 02 NR==1 { 03 next; 04 } 05 { 06 sum_color+=$6; 07 sum_bw+=$7; 08 } 09 END { 10 print sum_color " Printed in color"; 11 print sum_bw " Printed in B&W"; 12 }
Invoke the script with awk -f eval1.awk printlog.txt , which first loads the script file, then executes the commands in Awk on printlog.txt . The script starts by skipping a row if NR==1 , which essentially ignores the table header row. It would also be possible to store this line in a variable and output it at the end.
Awk increments the sum variables depending on number of color ($6 ) and B&W ($7 ) pages and prints both totals. Processing this print job for a cost center is a bit more complicated, but Awk can handle the task quite effectively (see Listing 4 and eval2.awk ).
Listing 4
Print log evaluation 2
01 # Evaluating the number of B&W printed pages 02 # for a cost center 03 NR==1 { 04 next; 05 } 06 {printer[$8]+=$7} 07 END { 08 print "costctr. totals"; 09 for (F in printer) {print F " " printer[F]} 10 }
Here again, the next command skips the first row. The remaining rows are to be evaluated to count the number of B&W pages for the cost center. The printer[$8]+=$7 command increments page counts in the printer[] array for the cost centers. Once all the datasets are read, the loop evaluates the printer data field in the END block and outputs the totals for each cost center.
The printer[] array represents the names of the cost centers. In this article's examples, you will also find the eval3.awk script, which sums up all the printings for each user by using two data fields.
Evaluating Number Values
The following example prepares some simple number values in a time column and five columns with floating point values (Listing 5). Because they are floating points, be sure to set the LC_ALL=C variable in shell before you try this example.
Listing 5
Sample Number Values
t Val1 Val2 Val3 Val4 Val5 0.100000 0.194000 0.166000 0.162000 0.155000 0.194200 0.200000 0.440000 0.388000 0.359000 0.392000 0.400000 ...
Listing 5 shows the table headers and the first few rows with sample data of the measureddata1.txt file. Calculating the average values is quite simple in Awk through the average.awk script in Listing 6.
Listing 6
average.awk
01 # Averaging the values 02 NR==1 {next;} 03 {sum = 0; 04 for (i=2; i<=NF; i++) { 05 sum+=$i; 06 } 07 average = sum/(NF-1) 08 printf("%6s %8.2f\n",$1,average); 09 }
You also often have to find the minimum and maximum values from a list. You can do this by sorting the rows in a column.
Listing 7 shows the numbervalues1.awk file that evaluates the minimum and maximum values. The numbers are first stored in an array ($1 through $NF ). After sorting, the first value has the minimum and the last value has the maximum.
Listing 7
numbervalues1.awk
01 # Evaluating number values 02 BEGIN { print(" t MIN MAX"); } 03 NR==1 { next; } 04 { 05 t=0; n=NF; 06 /* Store the number values in an array: */ 07 for (i=2; i<=n; i++) { 08 y[i-2] = $i; 09 } 10 /* Sort the values: */ 11 for (i = 0; i <= n-2; i++) { 12 for (j = i; j > 0 && y[j-1] > y[j]; j--) { 13 t = y[j]; y[j] = y[j-1]; y[j-1] = t; 14 } 15 } 16 /* Output: t, MIN, MAX */ 17 printf("%6s %8.6f %8.6f ",$1, y[0], y[n-2]); 18 printf("\n"); 19 }
Sorting the values works through a cut-and-paste insertion sort method that is appropriate for limited number of values. You can find more efficient algorithms in Jon Bentley's excellent book [5], for example. More complicated sorting algorithms require Awk functions.
« Previous 1 2 3 4 Next »
Buy this article as PDF
Pages: 5
(incl. VAT)