2 Matching Patterns and Processing Information with awk

This chapter describes the awk command, a tool with the ability to match lines of text in a file and a set of commands that you can use to manipulate the matched lines. In addition to matching text with the full set of extended regular expressions described in Chapter 1, awk treats each line, or record, as a set of elements, or fields, that can be manipulated individually or in combination. Thus, awk can perform more complex operations, such as:

Writing selected fields of a record

Reordering or replacing the contents of a record; for example, to change syntax in a program source file or change system calls when porting from one system to another

Processing input to find numeric counts, sums, or subtotals

Verifying that a given field contains only numeric information

Checking to see that delimiters are balanced in a programming file

Processing data contained in fields within records

Changing data from one program into a form that can be used by a different program

This chapter contains the following sections:

Running the awk program (Section 2.1)

Printing in awk (Section 2.2)

Using variables in awk (Section 2.3)

More about using regular expressions as patterns (Section 2.4)

Using relational expressions and combined expressions as patterns (Section 2.5)

Using pattern ranges (Section 2.6)

Actions in awk (Section 2.7)

Using operators in an action (Section 2.8)

Using Functions within an Action (Section 2.9)

Using Control Structures in awk (Section 2.10)

Performing actions before or after processing the input (Section 2.11)

Concatenating strings (Section 2.12)

Redirections and pipes (Section 2.13)

2.1 Running the awk Program

The awk command has the following syntax:

awk [ [-FERE] ] [ [-v var=val] ] { [-f prog_file] | [prog_text] } [ file1 [ file2 ... ] ]

Table 2-1 describes the flags for the awk command.

Table 2-1: Flags for the awk Command

Flag	Description
`-FERE`	Specifies an extended regular expression to be used as a field separator. By default, `awk` uses white space (any number of adjacent tabs or spaces) to separate fields in a record. To specify an alternate separator containing white space or a shell metacharacter, enclose the entire flag in apostrophes. For example: %echo $PATH \| awk -F':' '{for(n=1;n<=NF;n++)print $n}'
`-v` `var`=`val`	Assigns the value `val` to a variable named `var`; such assignments are available to the `BEGIN` block of a program. The `awk` command accepts multiple `-v` flags.
`-f` `prog_file`	Specifies the name of a file containing an `awk` program. This flag requires a file name as an argument. The `awk` command accepts multiple `-f` flags, concatenating all the program files and treating them as a single program.

You can specify the awk program to be executed either with the -f prog_file flag or as a program on the command line. Enclose a command-line program with apostrophes ( ' ' ) or quotation marks ( " " ) as needed to control file name expansion and variable substitution. It makes the awk program easier to read if you use apostrophes ( ' ' ) and use the -v var=val option to pass any shell variables into the awk program.

Usually, you create an awk program file before running awk. The program file is a series of statements that look like the following:

pattern { action }

In this structure, a pattern is one or more expressions that define the text to be matched. Patterns can consist of the following:

BEGIN or END

Boolean combinations of regular expressions using the operators ! (NOT) , || (Logical OR), and && (AND), with parentheses for grouping expressions

Boolean combinations of relational operations on strings, numbers, fields, and variables

Ranges of records, specified in this way:
```
pattern1,pattern2
```

An action is one or more steps to be executed, designated with awk commands, operands, and operators. Actions can consist of the following:

Assignment statements

Statements to format and print data

Tests to alter the flow of control

Control structures, such as if-else, while, and for statements

Redirection of output to one or more output streams besides standard output

Piping of output and input

The braces ( { } ) are delimiters separating the action from the search pattern. Actions can be specified on a single line, or on multiple lines to give a visual structure to the program. If you place an action consisting of several commands on one line, separate the commands with semicolons ( ; ). For example, either of the two following programs will find every record containing either `Gunther' or `gunther'. For each matching record, it will print two lines, first the number of the record on which the match was made and then the first two fields of the matched record:

Program 1:

/[Gg]unther/ { print "Record:", NR ; print $1, $2 }

Program 2:

/[Gg]unther/ {
  print "Record:", NR
  print $1, $2
}

Output from these programs might look like the following:

Record: 382
Schuller Gunther
Record: 397
schwarz gunther

Both the pattern and the action are optional elements of a program line. If you omit the pattern, awk performs the action on every record in the file; if you omit the action, awk copies the record to standard output. A null program passes its input unmodified to the output.

After you create the program file, enter the awk command on the command line as follows:

$ awk -f progfile infile > outfile

This command uses the program in progfile to process infile, and writes the output to outfile. The input file is not changed.

With a short program, you can accomplish the same job by entering the program on the command line before the name of the input file. For example:


$ awk '/[Gg]unther/ { print $1, $2 }' infile

When you use awk in this way, enclose the program in apostrophes ( ' ' ) and use the -v var=val option to pass in any shell variables.

When you start awk, it reads the program, checking for syntax. It then reads the first record of the input file, testing the record against each of the patterns in the program file in order of their appearance. When awk finds a pattern that matches the record, it performs the associated action. Then awk continues to search for matches in the program file. When it has compared the first input record against all patterns in the program file and performed all the actions required for that record, awk reads the next input record and repeats the program with that record. Processing continues in this manner until the end of the input file is reached. Figure 2-1 is a flowchart of this sequence. Compare the operation of awk with the very similar operation of the sed editor, shown in Figure 3-1.

Figure 2-1: Sequence of awk Processing

2.2 Printing in awk

You can use either the print command or the printf command to produce output in awk. The print command syntax allows arguments to be separated by commas or spaces. Arguments separated by commas are printed using the current output field separator (OFS; default is a space). Arguments separated by a space are concatenated as they are printed. For example:

awk 'BEGIN{ x=22; print "ABC" x, "DEF" }'
ABC22 DEF


printf( "format", value1 [, value2 , ...] )

This command prints the arguments value1, value2, and so on, formatted as defined by the format string. See awk(1) and printf(3) for information on constructing format specifiers.

2.3 Using Variables in awk

The awk program uses variables to manipulate information. Variables are of the following types:

Simple variables (Section 2.3.1)

Field variables (Section 2.3.2)

Array variables (Section 2.3.3)

Built-In awk variables (Section 2.3.4)

The awk language supports the set of built-in variables described in Section 2.3.4. You also can create and modify variables of all three types. For example, the following assignment statement creates a variable named var whose value is the sum of the third and fourth field variables in the current record:

var = $3 + $4

You can use variables as part of a pattern, and you can manipulate them in actions. For example, the following program assigns a value to a variable named tst and then uses tst as part of a pattern for further actions:

{ tst = $1 }
tst == $3 { print }

Section 2.3.1, Section 2.3.2, and Section 2.3.3 discuss the three types of variables and how to use them. Some of the examples in these sections illustrate the use of other awk features; beginning with Section 2.4, the remaining sections in the chapter provide more detailed information about these features.

2.3.1 Simple Variables

You can create any number of simple (scalar) variables, assigning values to them as required. If you refer to a variable before explicitly assigning a value to it, awk creates the variable and assigns it an empty string value (""). Variables can have numeric (floating-point) values or string values depending on their use in the action expression. For example, in the expression x = 1, x is a numeric variable. Similarly, in the expression x = "smith", x is a string variable. However, awk converts freely between strings and numbers when needed. Therefore, in the expression x = "3"+"4", awk assigns a value of 7 (numeric) to x, even though the arguments are literal strings. If you use a variable containing a nonnumeric value in a numeric expression, awk assigns it a numeric value of 0. For example:

y = 0
z = "ABC"
x = y+z
print x, z

This sequence prints "0 0" because y is assigned a value of 0 and z assumes a value of 0 when used numerically.

You can force a variable to be treated as a string by concatenating the null string ( "" ) to the variable; for example, x = 2 "". (See Section 2.12 for information on concatenating strings.) You can force a variable to be treated numerically by adding zero to it. Forcing variables to be treated as particular types can be useful. For example, if x is "0100" and y is "1", awk usually treats both variables as numerics and considers that x is greater than y. Forcing both variables to be treated as strings causes x to be less than y because "0" precedes "1" in standard character collating sequences.

2.3.2 Field Variables

Fields in the current record, also called field variables, share the properties of simple variables. They can be used in arithmetic or string operations and can be assigned numeric or string values. You can modify the current record ($0) explicitly in awk. The following action replaces the first field with the record number and then prints the resulting record:

{ $1 = NR; print }

The next example adds the second and third fields and stores the result in the first field:

{ $1 = $2 + $3; print $0 }

(Printing $0 is identical to printing with no arguments.)

You can use numeric expressions for field references; the following example prints the first, second, and sixth fields:

i = 1
n = 5
{ print $i, $(i+1), $(i+n) }

As described in Section 2.3.1, awk converts between string and numeric values. How you use a field determines whether awk treats it as a string or numeric value. If it cannot tell how a given field is used, awk treats it as a string.

The awk program splits input records into fields as needed.

2.3.3 Array Variables

Like field variables, array variables share the properties of simple variables. They can be used in arithmetic or string operations and can be assigned numeric or string values. You do not need to declare or initialize array elements; awk creates them and initializes them to an empty string ("") upon first reference. The delete statement can be used to remove unwanted array elements see Table 2-7 for additional information.

Subscripts are indicated by being enclosed in brackets. You can use any value that is not null, including a string value, for a subscript. An example of a numeric subscript follows:

x[NR] = $0

This expression creates the NRth element of the array x and assigns the contents of the current input record to it. The following example illustrates using string subscripts:

/apple/  { x["apple"]++ }
/orange/ { x["orange"]++ }
END      { print x["apple"], x["orange"] }

For each input record containing apple, this program increments the appleth element of array x (and similarly for orange), thereby producing and printing a count of the records containing each of these words. (This is not a count of the number of occurrences, because a word can appear more than once in a record.)

Problems can occur when you use an if or while statement to locate an array element. (See Section 2.10 for information on using these and other control structures.) If the array subscript does not exist, the statement adds the subscript as a new hash table entry with the array element having a null value. For example:

if (exists[$2] == 1) print i

To avoid this type of problem, use code similar to the following, in which i is printed only if the array element exists and array element's value is 1:

if (i in exists) {
  if (exists[i]== 1) print i
}

All the elements of an array can be processed in a for loop as follows:

for(i in exists) {
  print exists[i]
}

Also use this type of coding when while is used with a relational operator.

You can split any literal string or string variable into an array by using the split function. For example:

n = split("Thu Mar 18 11:19:40 EST 1999", array1)
m = split(array1[4], array2, ":")

The first line in this example splits the literal string into elements of an array named array1, creating array1[1] to array1[n] where n is the number of fields in the string. The second line splits the variable array1[4] using colon ( ":" ) as the separator into array2 (see Section 2.9).

2.3.4 Built-In awk Variables

The awk programs recognize the set of built-in variables listed in Table 2-2.

Table 2-2: Built-In Variables in awk

Variable	Description
`$0`	The contents of the current record.
`$n`	The contents of field `n` of the input record. In `awk` you can modify the entire record ( `$0` ).
`ARGC`	A count of the arguments given to `awk`. This variable is modifiable. Does not include the command name, flags preceded by minus signs, the script file name (if any), or variable assignments.
`ARGV`	An array from `ARGV[0]` to `ARGV[ARGC-1]` containing the command name followed by the arguments given to `awk`. The elements of this array are modifiable. Does not include flags preceded by minus signs, the script file name (if any), or variable assignments.
`CONVFMT`	The conversion format for numbers (by default, `%.6g`).
`ENVIRON`	A modifiable array containing the current set of environment variables; accessible by `ENVIRON["name"` `]`, where `"name"` is a variable or literal containing the name of the environmental variable. Changing an element in this array does not affect the environment passed to commands that `awk` spawns by redirection, piping, or the `system()` function.
`FILENAME`	The name of the current input file. If no input file was named, `FILENAME` contains a single minus sign. Inside a `BEGIN` action, `FILENAME` is undefined. Inside an `END` action, `FILENAME` reflects the last file read.
`FNR`	The number of the current record within the current file. Differs from `NR` if multiple files are being processed and the current file is not the first file read.
`FS`	The character or expression used for a field separator. By default, any amount of white space. In `awk`, field separators can be multibyte regular expressions and can be multiply defined. For example, the following statement defines either a comma followed by any amount of white space or at least one white-space character as the field separator: FS = ",[ \t]*\|[ \t]+"
`NF`	The number of fields in the current record.
`NR`	The number of the current record, counted sequentially from the beginning of the first file read. Differs from `FNR` if multiple files are being processed and the current file is not the first file read.
`OFMT`	The format specification for numbers on output (by default, `%.6g`).
`OFS`	The output field separator; or string inserted between fields when the data is written. By default, a space character.
`ORS`	The character used for the output record separator (the character between records when the data is written). By default, a newline character.
`RLENGTH`	The length of the string matched by `match()`; set to -1 if no match.
`RS`	Input character used for a record separator.
`RSTART`	The index (position within the string) of the first character matched by `match()`; set to 0 if no match.
`SUBSEP`	The separator for multiple subscripts in array elements (by default \034, the ASCII FS character).

See awk(1) for more information about these variables.

2.4 More About Using Regular Expressions as Patterns

The simplest regular expression is a literal string of characters. Regular expressions in awk must be enclosed in slashes. To include a slash as part of an expression, escape the slash with a backslash. For example, /\/usr\/share/ is an expression that matches the string /usr/share.

Following is an example of an awk program that prints all records containing the string the.

/the/

Because this expression does not specify blanks or other qualifiers, the program displays records containing "the" as a separate word and records containing the string as part of words such as "northern". Regular expressions are case sensitive. To find either "The" or "the", use a bracketed expression as follows:

/[Tt]he/

The awk language supports the full set of extended regular expressions described in Chapter 1. Additionally, in awk the circumflex ( ^ ) and dollar sign ( $ ) can apply to a specific field or variable as well as to the entire line. The following example will match a field consisting of the word, cat, or the word, cats, but will not match any word containing these strings (such as concatenate):

{ for (i=1;i<=NF;i++) if ($i ~ /^cats?$/) print }

2.5 Using Relational Expressions and Combined Expressions as Patterns

Relational expressions let you restrict a match to a specific field of a record or to make other tests, either numeric or string-related. One example earlier in this chapter (in Section 2.3) illustrates the use of relational expressions in patterns. The awk program defines the following relational operators for use in building patterns:

`==`	Equivalent
`!=`	Not equivalent
`<`	Less than
`>`	Greater than
`<=`	Less than or equal
`>=`	Greater than or equal
`~`	Matches regular expression
`!~`	Does not match regular expression

Use the == (equivalent) and != (not equivalent) operators to test literal strings and numeric values. For example:

str == "literal string"
num != 23
$NF == 1991

The last line in this example uses the $n syntax combined with the built-in variable NF to test the value of the last field of a record. To test against regular expressions, use the ~ (matches regular expression) and !~ (does not match regular expression) operators as follows:

str ~ /[Ll]iteral/

You can test relational expressions against built-up expressions. For example, the following pattern finds all records whose second field ( $2 ) is at least 100 greater than the first field ( $1 ):

$2 > $1 + 100

The following pattern finds records that contain an even number of fields:

NF % 2 == 0

Use the operators listed in Section 2.8 to build expressions.

You can use magnitude-comparison operators to test strings. For example, the following pattern finds records that begin with s or any character that appears after it to the end of the character set:

$0 >= "s"

You can combine two or more patterns by using the following Boolean operators:

`&&`	AND
`\|\|`	Logical OR
`!`	NOT

For example, to prevent nonalphanumeric matches in the preceding example, you can combine two expressions as follows:

($0 >= "s" && $0 < "{")

(The left brace is the character immediately following the letter z in the ASCII code.)

2.6 Using Pattern Ranges

You can use a pattern range to select a group of records to operate on. A pattern range consists of two patterns separated by a comma; the first pattern specifies the start of the range, and the second pattern specifies the end of the range. The awk program performs the associated action on all records in the range, including the records that match the two patterns. For example:

NR==100,NR==200 { print }

This program prints 101 records from the input file, beginning with record 100 and ending with record 200.

Using a pattern range does not disable other patterns from matching records within the range. However, because the input file is processed record by record, with each record being subject to all the actions appropriate to it before the next record is considered, the actions taken can appear to be out of sequence. For example:

2,4 { print }
/share/ { print "Found share" }

Apply this program to the following input file:

This is a test file
Line two
Try to share things
Line four
Last line of file

When this file is processed by awk, the output is as follows:

Line two
Try to share things
Found share
Line four

The second action is applied to record 3 before record 4 is examined to see if it matches the first pattern.

2.7 Actions in awk

An action can be a single command, such as print, or it can be a series of commands. An action can include tests to select records or parts of records; you also can create a program that has no explicit patterns, relying instead on relational comparisons within its actions. Such a program can bear a strong resemblance to a C program; for example:

{
  if ($1 == 0) {
    print;
    printf("%5.2f\n", $2+$3)
  } else {
    printf("%5.2f\n", $1+$2)
  }
}

Note

The semicolon after the print command, which would be required in a C program, is not required by awk, but it does not cause an error.

2.8 Using Operators in an Action

Use the operators shown in Table 2-3 to build expressions within the action statement.

Table 2-3: Operators for awk Actions

Precedence	Operator	Description	Example
1	`()`	Parentheses	3+x4 = 3+(x4)
2	`$`	Field reference	$(NF-1) = next to last field
3	`++`	Increment	See the description below
3	`--`	Decrement	See the description below
4	`^`	Exponentiation	2^3 = 8
5	`!`	Logical negation	!x is not equal to x
6	`+`	Unary plus	+4 = 4
6	`-`	Unary minus	-4 is negative 4
7	`*`	Multiplication	2*4 = 8
7	`/`	Division	6/3 = 2
7	`%`	Modulo (Remaindering)	7%3 = 1
8	`+`	Addition	2+3 = 5
8	`-`	Subtraction	7-3 = 4
9	`space`	Concatenation	"a" "b" = "ab"
10	`<`	Less than	5 < 6
10	`>`	Greater than	"qrs" > "abc"
10	`<=`	Less than or Equal to	3 <= 3
10	`>=`	Greater than or Equal to	4 >= 2
10	`==`	Equal	9 == 9
10	`!=`	Not Equal	"xyz != "abc"
11	`~`	Match regular expr	"tmp.c" ~ /[a-z]+\.[ch]/
11	`!~`	Not Match regular expr	"tmp.o" !~ /[a-z]+\.[ch]/
12	`in`	Array Membership	for (j in arr) print arr[j]
13	`&&`	Logical AND	X
14	`\|\|`	Logical OR	X
15	`?:`	Conditional Expression	x == -1 ? "error" : "OK"
16	`=`	Assignment	x = 3
16	`^=`	Exponentiation by value	x^=3 is equivalent to x = x^3
16	`*=`	Multiply by value	x=y is equivalent to x = xy
16	`/=`	Divide by value	x/=y is equivalent to x = x/y
16	`%=`	Modulo by value	x%=y is equivalent to x = x%y
16	`+=`	Increment by value	x+=y is equivalent to x = x+y
16	`-=`	Decrement by value	x-=y is equivalent to x = x-y

The following example prints the sum of all the first fields and the sum of all the second fields in the input file:

{ s1 += $1; s2 += $2 }
END { print s1,s2 }

The position of the increment and decrement operators affects their interpretation. The expression i++ evaluates the current contents of i and then increments i. The expression ++i causes awk to increment i before evaluation. For example:

$ echo "3 3" | awk '{
>   print "$1 =", $1 "; $1++ =", $1++ "; new $1 =", $1
>   print "$2 =", $2 "; ++$2 =", ++$2 "; new $2 =", $2
> }'
$1 = 3; $1++ = 3; new $1 = 4
$2 = 3; ++$2 = 4; new $2 = 4

2.9 Using Functions Within an Action

The awk language includes the built-in mathematical functions listed in Table 2-4.

Table 2-4: Built-In awk Mathematical Functions

Function	Description
`atan2(x,y)`	Returns the arctangent of the value specified by `x/y`.
`cos(expr)`	Returns the cosine of the value (in radians) specified by `expr`.
exp(`arg)`	Returns the natural antilogarithm (base e) of `arg`. For example, `exp(0.693147)` returns 2. See `log(arg)`.
`int(arg)`	Returns the integer part of `arg`.
`log(arg)`	Returns the natural logarithm (base e) of `arg`. For example, `log(2)` returns 0.693147. See `exp(arg)`.
`rand`	Returns a pseudorandom number (0 <= `n` < 1).
`sin(arg)`	Returns the sine of the value (in radians) specified by `arg`.
`sqrt(arg)`	Returns the square root of `arg`.
`srand(seed)`	Uses `seed` as the seed for a pseudorandom number sequence for subsequent calls to `rand`. If no seed is specified, the time of day is used. The return value is the previous seed.

The awk language includes the built-in string functions listed in Table 2-5.

Table 2-5: Built-In awk String Functions

Function	Description
`gsub(expr,s1,s2)`	Replaces every sequence of characters in string `s2` that matches the regular expression `expr` with the string specified by `s1`. If `s2` is not supplied, the current input record is used. Regular expression `expr` is reevaluated for each match. This function returns a value representing the number of replacements. See also `sub(expr,s1,s2)`.
`index(s1,s2)`	Returns the character position in string `s1` where string `s2` occurs. If `s2` is not in `s1`, this function returns a zero.
`length`	Returns the length in characters of the current record.
`length(arg)`	Returns the length in characters of the string specified by `arg`. See `length`.
`match(s,expr)`	Returns the character position in string `s` where a match is found for the regular expression `expr`; sets the variable `RSTART` to the character position at which the match begins and `RLENGTH` to a value representing the length of the matched string. If no match is found, this function returns a zero.
`split(s,array,sep)`	Splits string `s` into consecutive elements of `array[1]...[n]` and returns the number of elements. The optional `sep` argument specifies a field separator other than the one currently in force (the default is the contents of the `FS` variable).
`sprintf(f,e1,e2 ,...)`	Returns (but does not print) a string containing the arguments `e1` and so on, formatted in the same manner as by the `printf` command.
`sub(expr,s1,s2)`	Replaces the first sequence of characters in string `s2` that matches the regular expression `expr` with the string specified by `s1`. If `s2` is not supplied, the current input record is used. This function returns a value representing the number of replacements (0 or 1). See also `gsub(expr,s1,s2)`.
`substr(s,m,n)`	Returns the substring of `s` that begins at character position `m` and is `n` characters long. The first character in `s` is at position 1. If `n` is omitted or if the string is not long enough to supply `n` characters, the rest of the string is returned.
`tolower(s)`	Translates all uppercase letters in string `s` to lowercase. If there is no argument, the function operates on the current record.
`toupper(s)`	Translates all lowercase letters in string `s` to uppercase. If there is no argument, the function operates on the current record.

The awk language includes the built-in miscellaneous functions listed in Table 2-6.

Table 2-6: Built-In awk Miscellaneous Functions

Function	Description
`close(arg)`	Closes the file or pipe named by `arg`.
`system("command")`	Executes the system command specified and returns its exit status. The entire command must be enclosed in quotation marks to prevent `awk` from attempting to interpret it as one or more variable names.

The awk language also lets you create functions by using the following syntax:


function name ( parameter-list ) {
   statements
}

The word func can be used in place of function. For functions that you create, the left parenthesis both in the function's definition and in its use must immediately follow the function name with no intervening space. The names in the function declaration's parameter list are the formal parameters for use within the function. When you call a function, awk replaces these formal parameters with the values you supply in the calling statement. Functions can be recursive.

You can define local variables for a given function by declaring them as extra formal parameters; upon function entry, all local variables are initialized as empty strings or the number 0. To avoid visual confusion between real parameters and local variables, you can separate the local variables with extra spaces in the function declaration. For example:

function foo(in, out,      local1, local2) {
  local1 = "foo"
  local2 = "bar"

.
.
.
}

2.10 Using Control Structures in awk

The awk language provides the control structures listed in Table 2-7. Except where noted, these structures work exactly as they do in the C language. To perform several statements in a single control structure's action, enclose the statements in braces. If only a single statement is to be performed, the braces are optional. Each of the first two if structures in the following example includes a single statement to be executed; these structures are equivalent:

{
  if (x == y) print
  if (x == y) {
    print
  }
  if (x == y) {
    print $3
    printf("Sum = %d\n", x+z)
  }
}

Table 2-7: Control Structures in awk

Structure	Description
`if-else`	The condition in parentheses in an `if-else` structure is evaluated. If true, the statements following the `if` are performed. If false, the statements following the optional `else` keyword are performed. Cascading `if` statements may be specified with `else if` statements. The order that "else" and "if" appear is important. As in: if ( $1 == "abc" ) { print("found abc\n"); } else if ( $1 == "qrs" ) { print("found qrs\n"); } else if ( $1 == "xyz" ) { print("found xyz\n"); } else { print("did not find "abc", "qrs", or "xyz"\n"); }
`delete`	Array elements may be deleted using the delete statement. for example: { for(j in x) delete x[j] } will remove all the elements of the array x.
`while`	The statements following the `while` statement are performed as long as the tested condition is true. The following example prints all the fields in the input records, one field per line: { i = 1 while(i<=NF) print $i++ }
`for`	The `for(expr1;expr2;expr3)` `statements` structure is equivalent to the following `while` construct: { `expr1` `while(expr2) {` `statements` `expr3` `}` `}` The previous `while` example could also be written as follows: { for(i=1;i<=NF;++i) print $i } The `for(i in array)` statement processes all the elements in an array: $2=="="{name_value_pairs[$1]=$3} end{ for (i in name_value_pairs) print name_value_pairs[i] }
`break`	The `break` statement causes an immediate exit from an enclosing `while` or `for` loop.
Comments	Include comments in an `awk` program file to explain program logic. Comments begin with the number sign ( `#` ) and end with the end of the line. For example: { print x,y # This is a comment }
`continue`	The `continue` statement causes the next iteration of an enclosing loop to begin.
`getline`	The `getline` statement causes `awk` to discard the current input record, read the next input record, and continue scanning patterns from the present location. By using `getline var`, you can assign the `getline` input to a variable; without `var`, the input is assigned to the current record.
`next`	The `next` statement causes `awk` to discard the current input record, read the next input record, and begin scanning patterns from the start of the program file.
`exit`	The `exit` statement causes the program to stop as if the end of the input occurred.

2.11 Performing Actions Before or After Processing the Input

The awk program recognizes two special pattern keywords that define the beginning (BEGIN) and the end (END) of the input file. BEGIN matches the beginning of the input before reading the first record. Therefore, awk performs any actions associated with this pattern once, before processing the input file. For example, to change the field separator to a colon ( : ) for all records in the file, include the following line as the first line of the program file:

BEGIN { FS = ":" }

This example action works the same as using the -F: flag on the command line.

Similarly, END matches the end of the input file after processing the last record. Therefore, awk performs any actions associated with this pattern once, after processing the input file. For example, to print the total number of records in the input file, include the following line in the program file:

END { print NR }

2.12 Concatenating Strings

You concatenate strings by placing their variable names together in an expression. For example, the command print $1 $2 prints a string consisting of the first two fields from the current record, with no space between them. You can use variables, numeric operators, and functions when concatenating strings. (See Section 2.3.1 and Section 2.8 for information on variables and numeric operators.) The function length($1 $2 $3) returns the length in characters of the first three fields. (See Section 2.9 for a list of the functions in awk.) If the strings you want to concatenate are field variables (see Section 2.3.2), you are not required to separate the names with white space; the expression $1$2 is identical to $1 $2.

2.13 Redirection and Pipes

Unless otherwise specified, print and printf statements write their output to the standard output file. You can redirect the output of any printing statement by using standard redirection operators. For example:

print $0, $3, amt >> "reportfile"

This example appends its output to a file named reportfile instead of writing to the standard output. (If reportfile does not exist before the first instance of redirection, it is created.) The output file name in this example is enclosed in quotation marks. The quotation marks are required to distinguish the file name from a variable name. You can mix writing to named files with writing to the standard output.

The print and printf statements always send their output to stdout. The following example sends output to stderr:

print "oops: did not find expected input" | " cat 1>&2"

You also can pipe printed output through other commands. The following example pipes awk's output through the tr command to convert all uppercase letters to lowercase letters:

print | "tr '[A-Z]' '[a-z]'"

As with redirection, the command to which you pipe the output must be enclosed in quotation marks. In awk you can redirect the input to getline using standard redirection operators, and you can supply the input to getline from a pipe. For example:

expr | getline

Here, expr is interpreted as a system command.

The following example reads the output from a system command:

BEGIN {
  cmd = "ps aux"
  while( cmd | getline > 0 ) {
   if ( $2 == "PID" ) continue
   unique_users[$1]++
   }
   close(cmd)
 
   for(i in unique_users) {
    printf("%3d %s\n", unique_users[i], i)
   }
  }

Only a limited number of files can be open for output. The awk program uses your default open file descriptor limit. For efficiency, however, you can use the close(arg) statement to close files that you have opened for output and no longer need. For example:

{
if ( cur_file != "/tmp/" $1 ) {
   close(cur_file)
   cur_file = "/tmp/" $1
 }
 print $2 >cur_file
}
END { close(cur_file) }