Awk is a scripting language specially conceived for editing and evaluating text files, and it provides a very useful tool for system administrators.
With its help, for example, a modified data path can be integrated system-wide in a single step. Many more possible applications exist, however, including evaluating text files with data recorded in tabular format.
An Awk script consists of single commands or command blocks that are executed under certain conditions or for all input rows. You can call the Awk commands directly from the command line or save them in a file as a script. You can find all sample data for this article online [1]. To try the examples, unpack the archive into an empty directory. The archive will include all the files for your exercises. Some examples include Linux commands along with Awk commands.
The first example uses the $USER
system variable and takes you on a little tour of potential Awk applications:
$ echo $USER | awk '{print "Hello " $1 "!"}'
Awk takes the content of $USER
as a command parameter. In processing the input, Awk stores all data fields in the variables $1
through $NF
. The NF
variable contains the number of data fields. Awk extracts the first command parameter from standard output and uses it with print
and the word "Hello"
to output the message to its own standard output. Multiple commands executed in succession are separated by a semicolon. To output only the third and fourth field of an input row, use:
$ echo "A B C D" | awk '{print $3 " " $4}'
Of course, a tool like Awk can also handle multiple-row input. I'll show the cal
Unix program, which prints out a calendar month to the terminal, for example. Take a look at Figure 1. Awk prints the row number (NR
), the number of fields (NF
), and then the whole row ($0
) as a control. The fields in this first example are separated by spaces or tabs. Later, I'll show how to use other separators.
Figure 1: Awk can handle multiple-row input and separate data into fields.
Awk can use many other variables apart from NF
and NR
when interpreting input and controlling the output. The Awk manual [2] includes an entire summary, as does the program's man page. Using the --dump-variables
option, you can save the important variables from an Awk command in a file for later use.
Analyzing Files
Most practical applications read data from files and process it row by row, often reading from multiple files simultaneously. The awk '{print}' /etc/group
command returns all rows in the /etc/group
file, which contains rows with the data fields Group_name
, Group_password
, Group_ID
and User_list
(all the users in the group).
In this case, the data fields are separated by colons instead of spaces or tabs. In reading group names or other data fields, you must use the field separator variable (FS
), as in the following:
$ awk -F':' '{print $1}' /etc/group
You can also use Posix character classes or regular expressions as separators. If you have a file with fields separated by commas, tabs or spaces, for example, use the following command:
$ awk -F'[,[:blank:]]' '{print $1 $2 $3}' data2.txt
With Awk, you can change the order of the fields on output. If you want to output the ID first, then the group name and the users, you can do so as follows:
$ awk -F':' '{print $3 " " $1 " " $4}' /etc/group
The print
command is suitable only for formatted output. You can get a more comprehensive display of data with the printf()
command that works as in C. Formatting options for numbers and character strings are described in detail in the Awk manual [2]. The following command formats the ID and user list:
$ awk -F':' '{printf("%5s %s\n",$3,$4)}' /etc/group
You will often want to exclude certain fields from the output. You can do this by assigning a null string to the field, as follows:
$ awk -F':' '{$2=""; print}' /etc/group
The output of this example no longer has the fields separated by colons, because Awk knows the separator for the output as well as the input. You can set the output separator values with the OFS
variable.
Patterns and Pattern Ranges
So far, you have applied all the Awk commands to all input rows. In most cases, only selected rows need to be processed. The following example looks for group names in the /etc/group
file that have no members. The condition is that the last field of the data row must be empty, and only those rows are read.
$ awk -F':' '$NF=="" {print $0}' /etc/group
An Awk script consists of three main blocks: BEGIN
, main, and END
, with at least one block present. The main block includes pairs of conditions and actions, with the action blocks depending on the conditions applied on all rows of the file.
condition1 { actionblock1 }
condition2 { actionblock2 }
The conditions are also called patterns in Awk. Patterns can consist of strings, regular expressions, logical expressions, and ranges. For example, you can output all the groups that contain your user name with the following command:
$ awk -F':' '/'"$USER"'/ {print $1}' /etc/group
Here the shell expands the content of the $USER
system variable and uses it as a search pattern. Unlike in the previous examples, Awk searches only those rows that fit the pattern.
Regular expressions are text patterns used in Perl, Awk, and many other programs and programming languages. You can find an introduction to regular expressions in the fifth edition of our Linux Shell Handbook
[3], and a standard reference is Jeffrey Friedl's book [4].
Awk uses regular expressions as search patterns in the format /<regex>/
. To search for all groups beginning with the word post
in /etc/group
, use the following command:
awk -F':' '/^post/ {print $1}' /etc/group
You can combine multiple regular expressions as search patterns by using the logical operators &&
(logical AND
) and ||
(logical OR
). The following command finds all rows beginning with root
or post
:
awk -F':' '/^post/ || /^root/ {print $1}' /etc/group
Using the combination /^ABC'/ && /DEF$/
, you can find all rows in a file that begin with ABC
and end with DEF
. You can define pattern ranges with beginning and ending conditions separated by a comma. The following two commands search rows 5 through 7 and then all groups from root
through lp
in the /etc/group
input file:
$ awk -F':' 'NR==5,NR==7 {print $0}' /etc/group
$ awk -F':' '/root/,/lp/ {print $1}' /etc/group
Pattern ranges are useful for processing data beginning with a certain expression and ending with others. Awk processes only the data rows determined by the pattern range and ignores all others. Another use of pattern ranges is when data is in blocks of rows separated by spaces. The following pattern range handles this condition:
awk '/main_criteria/,/^$/ {actions}
The range begins with a main criteria and ends with a blank row. The end condition uses the usual regular expression of the caret (^
), which translates to "the beginning of the row," followed by the dollar sign ($
), which translates to "the end of a row." The beginning of a row followed immediately by the end of a row, with no characters in between means "look for an empty row."
Built-In Functions
Awk has many useful functions that expand upon those already mentioned. The most important can be used to process text strings. You can concatenate, split up, search through, and replace text strings as you wish. Figure 2 shows a few of these functions, which are also included in the stringfunctions.sh
files. You can copy the examples into shell and experiment with the functions before implementing them later in scripts.
Figure 2: Awk provides a series of useful functions for processing text strings.
Another useful function is getline
, with which the output of a called Unix command can be piped in as an Awk variable (see the systemcalls.awk
example in Listing 1). Using getline
, you can also read data from files instead of directly from the command line. In this way, Awk can access data for formatting or comparing, for example.
01 # Integrating Unix command
02 BEGIN {
03 "date +%x" | getline day;
04 "date +%T" | getline time;
05 printf("\nToday is %s and it is %s o'clock.\n", day, time);
06 }
Begin and End
Along with the conditions previously mentioned, you have the special patterns BEGIN
and END
. These instruction blocks serve an important purpose in evaluating files. Awk processes the commands in the BEGIN
block before it reads in any data. The commands serve in initializing the script and setting variables. For example, if you want to change the delimiters for input and output, you can include the OFS
variable within the block. The BEGIN
block can also be used to output table headers.
The commands in the END
block are parsed after reading the last row of data. They are often used to evaluate the input data. For example, you can calculate sums and averages or add footnotes to output tables.
If both blocks are in an Awk script, it's often worth saving the script in a separate file. You can create these files with a text editor such as Emacs or Vi. Most editors provide syntax highlighting to help in pairing curly braces for Awk.
Interpreting Log Files
Many larger network printers and print servers log print jobs in a text file. Logs usually include the print job originators, page sizes and counts, and other free format data fields for such things as project cost centers and other information. Each row is a print job record. In Listing 2, you can see the table headers and some sample records from a printlog.txt
file.
Document User Device Format Medium col b/w costctr
C2.sxw LAGO pr04 DIN_A4 Normal 1 10 P01
prop.pdf LEHM pr03 DIN_A4 Normal 0 10 P01
offer.doc LOHN pr01 DIN_A4 Normal 3 0 P02
...
The Awk scripts for processing these kinds of files take up about six to eight lines, depending on the format. They are included in the examples you can download from the Ubuntu User
site. You can also copy Listing 3 into a text editor and save it as eval1.awk
. You can start by evaluating the total number of black-and-white or color printed pages. Each of the sums is handled by a variable.
01 #Evaluating the number of printed pages
02 NR==1 {
03 next;
04 }
05 {
06 sum_color+=$6;
07 sum_bw+=$7;
08 }
09 END {
10 print sum_color " Printed in color";
11 print sum_bw " Printed in B&W";
12 }
Invoke the script with awk -f eval1.awk printlog.txt
, which first loads the script file, then executes the commands in Awk on printlog.txt
. The script starts by skipping a row if NR==1
, which essentially ignores the table header row. It would also be possible to store this line in a variable and output it at the end.
Awk increments the sum variables depending on number of color ($6
) and B&W ($7
) pages and prints both totals. Processing this print job for a cost center is a bit more complicated, but Awk can handle the task quite effectively (see Listing 4 and eval2.awk
).
01 # Evaluating the number of B&W printed pages
02 # for a cost center
03 NR==1 {
04 next;
05 }
06 {printer[$8]+=$7}
07 END {
08 print "costctr. totals";
09 for (F in printer) {print F " " printer[F]}
10 }
Here again, the next
command skips the first row. The remaining rows are to be evaluated to count the number of B&W pages for the cost center. The printer[$8]+=$7
command increments page counts in the printer[]
array for the cost centers. Once all the datasets are read, the loop evaluates the printer data field in the END
block and outputs the totals for each cost center.
The printer[]
array represents the names of the cost centers. In this article's examples, you will also find the eval3.awk
script, which sums up all the printings for each user by using two data fields.
Evaluating Number Values
The following example prepares some simple number values in a time column and five columns with floating point values (Listing 5). Because they are floating points, be sure to set the LC_ALL=C
variable in shell before you try this example.
t Val1 Val2 Val3 Val4 Val5
0.100000 0.194000 0.166000 0.162000 0.155000 0.194200
0.200000 0.440000 0.388000 0.359000 0.392000 0.400000
...
Listing 5 shows the table headers and the first few rows with sample data of the measureddata1.txt
file. Calculating the average values is quite simple in Awk through the average.awk
script in Listing 6.
01 # Averaging the values
02 NR==1 {next;}
03 {sum = 0;
04 for (i=2; i<=NF; i++) {
05 sum+=$i;
06 }
07 average = sum/(NF-1)
08 printf("%6s %8.2f\n",$1,average);
09 }
You also often have to find the minimum and maximum values from a list. You can do this by sorting the rows in a column.
Listing 7 shows the numbervalues1.awk
file that evaluates the minimum and maximum values. The numbers are first stored in an array ($1
through $NF
). After sorting, the first value has the minimum and the last value has the maximum.
01 # Evaluating number values
02 BEGIN { print(" t MIN MAX"); }
03 NR==1 { next; }
04 {
05 t=0; n=NF;
06 /* Store the number values in an array: */
07 for (i=2; i<=n; i++) {
08 y[i-2] = $i;
09 }
10 /* Sort the values: */
11 for (i = 0; i <= n-2; i++) {
12 for (j = i; j > 0 && y[j-1] > y[j]; j--) {
13 t = y[j]; y[j] = y[j-1]; y[j-1] = t;
14 }
15 }
16 /* Output: t, MIN, MAX */
17 printf("%6s %8.6f %8.6f ",$1, y[0], y[n-2]);
18 printf("\n");
19 }
Sorting the values works through a cut-and-paste insertion sort method that is appropriate for limited number of values. You can find more efficient algorithms in Jon Bentley's excellent book [5], for example. More complicated sorting algorithms require Awk functions.
Conclusion
Awk can address many everyday user or administrator tasks quickly and effortlessly, and it is easier to learn than many other programming languages. This article covers the basics and provides a few more complex examples. You can find support and countless other examples in news groups and on many websites.
The free documentation [2] makes learning Awk easy and includes many more examples. The book by the three Awk inventors – Alfred Aho, Peter Weinberger, and Brian Kernighan [6] – is recommended for more advanced users.l
- Code examples from this article: ftp://ftp.linux-magazine.com/pub/listings/ubuntu-user.com/25/AWK/
- Robbins, Arnold. The GNU Awk User's Guide
2003: http://www.gnu.org/software/gawk/manual/gawk.html
- "Regular Expressions" by Martin Streicher, Linux Shell Handbook 5
: http://www.sparkhaus-shop.com/eu/magazines/special-editions/eh32040.html
- Friedl, Jeffrey. Mastering Regular Expressions
, O'Reilly, 2006.
- Bentley, Jon. Programming Pearls
, Addison-Wesley, 2000.
- Aho, Kernighan, and Weinberger. The AWK Programming Language
, Addison-Wesley, 1988.