This chapter describes the
awk
command, a tool with the ability to match lines of text in a file
and a set of commands that you can use to manipulate the matched lines.
In
addition to matching text with the full set of extended regular expressions
described in
Chapter 1,
awk
treats each line,
or record,
as a set of
elements, or fields,
that
can be manipulated individually or in combination.
Thus,
awk
can perform more complex operations, such as:
Writing selected fields of a record
Reordering or replacing the contents of a record; for example, to change syntax in a program source file or change system calls when porting from one system to another
Processing input to find numeric counts, sums, or subtotals
Verifying that a given field contains only numeric information
Checking to see that delimiters are balanced in a programming file
Processing data contained in fields within records
Changing data from one program into a form that can be used by a different program
This chapter contains the following sections:
Running the
awk
program (Section 2.1)
Printing in
awk
(Section 2.2)
Using variables in
awk
(Section 2.3)
More about using regular expressions as patterns (Section 2.4)
Using relational expressions and combined expressions as patterns (Section 2.5)
Using pattern ranges (Section 2.6)
Actions in
awk
(Section 2.7)
Using operators in an action (Section 2.8)
Using Functions within an Action (Section 2.9)
Using Control Structures in
awk
(Section 2.10)
Performing actions before or after processing the input (Section 2.11)
Concatenating strings (Section 2.12)
Redirections and pipes (Section 2.13)
The
awk
command has the following
syntax:
awk
[
[-FERE]
]
[
[-v var=val]
]
{
[-f prog_file]
| [prog_text]
}
[ file1
[ file2 ...
]
]
Table 2-1
describes the flags for the
awk
command.
Table 2-1: Flags for the awk Command
| Flag | Description |
-FERE |
Specifies an extended regular expression
to be used as a field separator.
By default,
%echo $PATH | awk -F':' '{for(n=1;n<=NF;n++)print $n}'
|
-v var=val |
Assigns the value
val
to a variable named
var; such assignments are available
to the
BEGIN
block of a program.
The
awk
command accepts multiple
-v
flags. |
-f prog_file |
Specifies the name of a file containing an
awk
program.
This flag requires a file name as an argument.
The
awk
command accepts multiple
-f
flags,
concatenating all the program files and treating them as a single program. |
You can specify the
awk
program to be executed either
with the
-f
prog_file
flag or as a program on the command line.
Enclose a command-line program
with apostrophes ('"'-v var=val
option to pass any shell variables
into the awk program.
Usually, you create an
awk
program file before running
awk.
The program file is a
series of statements that look like the following:
pattern{action}
In this structure, a pattern is one or more expressions that define the text to be matched. Patterns can consist of the following:
BEGIN
or
END
Boolean combinations of regular expressions using the operators
!
(NOT) ,
||
(Logical OR),
and
&&
(AND), with
parentheses for grouping expressions
Boolean combinations of relational operations on strings, numbers, fields, and variables
Ranges of records, specified in this way:
pattern1,pattern2
An
action
is one or more steps to be executed, designated with
awk
commands, operands, and operators.
Actions can consist of the
following:
Assignment statements
Statements to format and print data
Tests to alter the flow of control
Control structures, such as
if-else,
while, and
for
statements
Redirection of output to one or more output streams besides standard output
Piping of output and input
The braces
({;
Program 1:
/[Gg]unther/ { print "Record:", NR ; print $1, $2 }
Program 2:
/[Gg]unther/ {
print "Record:", NR
print $1, $2
}
Output from these programs might look like the following:
Record: 382 Schuller Gunther Record: 397 schwarz gunther
Both the pattern
and the action are optional elements of a program line.
If you omit the pattern,
awk
performs the action on every record in the file; if you omit
the action,
awk
copies the record to standard output.
A null program passes its input unmodified to the output.
After you create the program file, enter the
awk
command on the command line as follows:
$ awk -f progfile infile > outfile
This command uses the program in
progfile
to process
infile, and writes the output to
outfile.
The
input file is not changed.
With a short program, you can accomplish the same job by entering the program on the command line before the name of the input file. For example:
$ awk '/[Gg]unther/ { print $1, $2 }' infile
When you use awk in this way, enclose the program in apostrophes ('-v var=val
option
to pass in any shell variables.
When you start
awk,
it
reads the program, checking for syntax.
It then reads the first record of
the input file, testing the record against each of the patterns in the program
file in order of their appearance.
When
awk
finds a pattern
that matches the record, it performs the associated action.
Then
awk
continues to search for matches in the program file.
When it
has compared the first input record against all patterns in the program file
and performed all the actions required for that record,
awk
reads the next input record and repeats the program with that record.
Processing
continues in this manner until the end of the input file is reached.
Figure 2-1
is a flowchart of this sequence.
Compare the operation
of
awk
with the very similar operation of the
sed
editor, shown in
Figure 3-1.
Figure 2-1: Sequence of awk Processing
You can use either the
print
command or the
printf
command to produce output in
awk.
The
print
command syntax allows arguments to be separated by commas
or spaces.
Arguments separated by commas are printed using the current output
field separator (OFS; default is a space).
Arguments separated by a space
are concatenated as they are printed.
For example:
awk 'BEGIN{ x=22; print "ABC" x, "DEF" }'
ABC22 DEF
printf(format" ",value1[,value2,...] )
This command prints the arguments
value1,
value2, and so on, formatted as defined by the
format
string.
See
awk(1)printf(3)2.3 Using Variables in awk
The
awk
program uses variables to manipulate information.
Variables are of the following
types:
Simple variables (Section 2.3.1)
Field variables (Section 2.3.2)
Array variables (Section 2.3.3)
Built-In
awk
variables (Section 2.3.4)
The
awk
language supports the set of built-in variables
described in
Section 2.3.4.
You also can create and
modify variables of all three types.
For example, the following assignment
statement creates a variable named
var
whose value is the
sum of the third and fourth field variables in the current record:
var = $3 + $4
You can use variables as part of a pattern, and you can manipulate them
in actions.
For example, the following program assigns a value to a variable
named
tst
and then uses
tst
as part
of a pattern for further actions:
{ tst = $1 }
tst == $3 { print }
Section 2.3.1,
Section 2.3.2,
and
Section 2.3.3
discuss the three types of variables
and how to use them.
Some of the examples in these sections illustrate the
use of other
awk
features; beginning with
Section 2.4,
the remaining sections in the chapter provide more detailed information about
these features.
2.3.1 Simple Variables
You can create any number of simple (scalar) variables,
assigning values to them as required.
If you refer to a variable before explicitly
assigning a value to it,
awk
creates the variable and assigns it an
empty string value ("").
Variables can have numeric (floating-point)
values or string values depending on their use in the action expression.
For example, in the expression
x = 1,
x
is a numeric variable.
Similarly, in the expression
x = "smith",
x
is a string
variable.
However,
awk
converts freely between strings
and numbers when needed.
Therefore, in the expression
x = "3"+"4",
awk
assigns a value of 7 (numeric) to
x, even though the arguments are literal strings.
If you use
a variable containing a nonnumeric value in a numeric expression,
awk
assigns it a numeric value of 0.
For example:
y = 0 z = "ABC" x = y+z print x, z
This sequence prints "0 0" because y is assigned a value of 0 and z assumes a value of 0 when used numerically.
You can force a variable to be treated as a string by concatenating
the null string (""x = 2 "".
(See
Section 2.12
for information on concatenating strings.) You can force a variable to be
treated numerically by adding zero to it.
Forcing variables to be treated
as particular types can be useful.
For example, if
x
is "0100" and
y
is "1",
awk
usually treats
both variables as numerics and considers that
x
is greater than
y.
Forcing both variables to be
treated as strings causes
x
to be less than
y
because "0" precedes "1" in standard
character collating sequences.
2.3.2 Field Variables
Fields
in the current record, also called field variables, share the properties of
simple variables.
They can be used in arithmetic or string operations and
can be assigned numeric or string values.
You can modify the current record
($0) explicitly in
awk.
The following
action replaces the first field with the record number and then prints the
resulting record:
{ $1 = NR; print }
The next example adds the second and third fields and stores the result in the first field:
{ $1 = $2 + $3; print $0 }
(Printing
$0
is identical to printing
with no arguments.)
You can use numeric expressions for field references; the following example prints the first, second, and sixth fields:
i = 1
n = 5
{ print $i, $(i+1), $(i+n) }
As described in
Section 2.3.1,
awk
converts between string and numeric values.
How you use a field
determines whether
awk
treats it as a string or numeric
value.
If it cannot tell how a given field is used,
awk
treats it as a string.
The
awk
program splits input records into fields
as needed.
2.3.3 Array Variables
Like field variables, array variables share the properties of
simple variables.
They can be used in arithmetic or string operations and
can be assigned numeric or string values.
You do not need to declare or initialize
array elements;
awk
creates them and initializes them to
an empty string ("") upon first reference.
The
delete
statement can be used to remove unwanted array elements see
Table 2-7
for additional information.
Subscripts are indicated by being enclosed in brackets. You can use any value that is not null, including a string value, for a subscript. An example of a numeric subscript follows:
x[NR] = $0
This expression creates the NRth element of the array x and assigns the contents of the current input record to it. The following example illustrates using string subscripts:
/apple/ { x["apple"]++ }
/orange/ { x["orange"]++ }
END { print x["apple"], x["orange"] }
For each input
record containing
apple, this program increments the
appleth element of array
x
(and similarly
for
orange), thereby producing and printing a count of
the records containing each of these words.
(This is not a count of the number
of occurrences, because a word can appear more than once in a record.)
Problems can occur when you use an
if
or
while
statement to locate an array element.
(See
Section 2.10
for information on using these and other control structures.)
If the array subscript does not exist, the statement adds the subscript as
a new hash table entry with the array element having a null value.
For example:
if (exists[$2] == 1) print i
To avoid this type of problem, use code similar to the following, in
which
i
is printed only if the array element exists
and array element's value is
1:
if (i in exists) {
if (exists[i]== 1) print i
}
All the elements of an array can be processed in a for loop as follows:
for(i in exists) {
print exists[i]
}
Also use this type of coding when
while
is used with
a relational operator.
You can split any literal string or string variable into an array
by using the
split
function.
For
example:
n = split("Thu Mar 18 11:19:40 EST 1999", array1)
m = split(array1[4], array2, ":")
The first line in this example splits the literal string into elements
of an array named
array1, creating
array1[1]
to
array1[n]
where n is the number of fields in the string.
The second line splits the variable
array1[4]
using colon
(
":"
) as the separator into
array2
(see
Section 2.9).
2.3.4 Built-In awk Variables
The
awk
programs recognize the set of built-in variables
listed in
Table 2-2.
Table 2-2: Built-In Variables in awk
| Variable | Description |
$0 |
The contents of the current record. |
$n |
The contents of field
n
of the input record.
In
awk
you can modify the entire
record ($0 |
ARGC |
A count of the arguments given to
awk.
This variable is modifiable.
Does not include the command
name, flags preceded by minus signs, the script file name (if any), or variable
assignments. |
ARGV |
An array from
ARGV[0]
to
ARGV[ARGC-1]
containing the command name followed by
the arguments given to
awk.
The elements of this array
are modifiable.
Does not include flags preceded by minus signs, the script
file name (if any), or variable assignments. |
CONVFMT |
The conversion format for numbers (by default,
%.6g). |
ENVIRON |
A modifiable array containing the current
set of environment variables; accessible by
ENVIRON["name"
], where
"name"
is a variable or literal containing the name of the environmental
variable.
Changing an element in this array does not affect the environment
passed to commands that
awk
spawns by redirection, piping,
or the
system()
function. |
FILENAME |
The name of the current input file.
If no
input file was named,
FILENAME
contains a single minus
sign.
Inside a
BEGIN
action,
FILENAME
is undefined.
Inside an
END
action,
FILENAME
reflects the last file read. |
FNR |
The number of the current record within the
current file.
Differs from
NR
if multiple files are being
processed and the current file is not the first file read. |
FS |
The character or expression used for
a field separator.
By default, any amount of white space.
In
FS = ",[ \t]*|[ \t]+"
|
NF |
The number of fields in the current record. |
NR |
The number of the current record, counted
sequentially from the beginning of the first file read.
Differs from
FNR
if multiple files are being processed and the current file is
not the first file read. |
OFMT |
The format specification for numbers on output
(by default,
%.6g). |
OFS |
The output field separator; or string inserted between fields when the data is written. By default, a space character. |
ORS |
The character used for the output record separator (the character between records when the data is written). By default, a newline character. |
RLENGTH |
The length of the string matched by
match(); set to -1 if no match. |
RS |
Input character used for a record separator. |
RSTART |
The index (position within the string) of
the first character matched by
match(); set to 0 if no
match. |
SUBSEP |
The separator for multiple subscripts in array elements (by default \034, the ASCII FS character). |
See
awk(1)2.4 More About Using Regular Expressions as Patterns
The simplest regular expression is a literal string of characters.
Regular expressions in
awk
must be enclosed in slashes.
To include a slash as part of an expression, escape the slash with a backslash.
For example,
/\/usr\/share/
is an expression that matches the string
/usr/share.
Following is an example of an
awk
program that prints
all records containing the string
the.
/the/
Because this expression does not specify blanks or other qualifiers, the program displays records containing "the" as a separate word and records containing the string as part of words such as "northern". Regular expressions are case sensitive. To find either "The" or "the", use a bracketed expression as follows:
/[Tt]he/
The
awk
language supports the full set of extended
regular expressions described in
Chapter 1.
Additionally, in
awk
the circumflex (^$
{ for (i=1;i<=NF;i++) if ($i ~ /^cats?$/) print }
2.5 Using Relational Expressions and Combined Expressions as Patterns
Relational expressions let you restrict a match
to a specific field of a record or to make other tests, either numeric or
string-related.
One example earlier in this chapter (in
Section 2.3)
illustrates the use of relational expressions in patterns.
The
awk
program defines the following relational operators for use in
building patterns:
== |
Equivalent |
!= |
Not equivalent |
< |
Less than |
> |
Greater than |
<= |
Less than or equal |
>= |
Greater than or equal |
~ |
Matches regular expression |
!~ |
Does not match regular expression |
Use the
==
(equivalent) and
!=
(not equivalent) operators to test literal strings and numeric values.
For
example:
str == "literal string" num != 23 $NF == 1991
The last line in this example uses the
$n
syntax combined with the built-in
variable
NF
to test the value of the last field of a record.
To test against regular expressions, use the
~
(matches
regular expression) and
!~
(does not match regular expression)
operators as follows:
str ~ /[Ll]iteral/
You can test relational expressions against built-up expressions.
For
example, the following pattern finds all records whose second field ($2$1
$2 > $1 + 100
The following pattern finds records that contain an even number of fields:
NF % 2 == 0
Use the operators listed in Section 2.8 to build expressions.
You can use magnitude-comparison operators to test strings. For example, the following pattern finds records that begin with s or any character that appears after it to the end of the character set:
$0 >= "s"
You can combine two or more patterns by using the following Boolean operators:
&& |
AND |
|| |
Logical OR |
! |
NOT |
For example, to prevent nonalphanumeric matches in the preceding example, you can combine two expressions as follows:
($0 >= "s" && $0 < "{")
(The left brace is the
character immediately following the letter z in the ASCII code.)
2.6 Using Pattern Ranges
You can use a pattern range to select a group of records to operate
on.
A pattern range consists of two patterns separated by a comma; the first
pattern specifies the start of the range, and the second pattern specifies
the end of the range.
The
awk
program performs the associated
action on all records in the range, including the records that match the two
patterns.
For example:
NR==100,NR==200 { print }
This program prints 101 records from the input file, beginning with record 100 and ending with record 200.
Using a pattern range does not disable other patterns from matching records within the range. However, because the input file is processed record by record, with each record being subject to all the actions appropriate to it before the next record is considered, the actions taken can appear to be out of sequence. For example:
2,4 { print }
/share/ { print "Found share" }
Apply this program to the following input file:
This is a test file Line two Try to share things Line four Last line of file
When this file is processed by
awk, the output is as follows:
Line two Try to share things Found share Line four
The second action is applied to record 3 before
record 4 is examined to see if it matches the first pattern.
2.7 Actions in awk
An action can be a single
command, such as
print, or it can be a series of commands.
An action can include tests to select records or parts of records; you also
can create a program that has no explicit patterns, relying instead on relational
comparisons within its actions.
Such a program can bear a strong resemblance
to a C program; for example:
{
if ($1 == 0) {
print;
printf("%5.2f\n", $2+$3)
} else {
printf("%5.2f\n", $1+$2)
}
}
Note
The semicolon after the
awk, but it does not cause an error.
2.8 Using Operators in an Action
Use the operators shown in
Table 2-3
to build
expressions within the action statement.
Table 2-3: Operators for awk Actions
| Precedence | Operator | Description | Example |
| 1 | () |
Parentheses | 3+x*4 = 3+(x*4) |
| 2 | $ |
Field reference | $(NF-1) = next to last field |
| 3 | ++ |
Increment | See the description below |
| 3 | -- |
Decrement | See the description below |
| 4 | ^ |
Exponentiation | 2^3 = 8 |
| 5 | ! |
Logical negation | !x is not equal to x |
| 6 | + |
Unary plus | +4 = 4 |
| 6 | - |
Unary minus | -4 is negative 4 |
| 7 | * |
Multiplication | 2*4 = 8 |
| 7 | / |
Division | 6/3 = 2 |
| 7 | % |
Modulo (Remaindering) | 7%3 = 1 |
| 8 | + |
Addition | 2+3 = 5 |
| 8 | - |
Subtraction | 7-3 = 4 |
| 9 | space |
Concatenation | "a" "b" = "ab" |
| 10 | < |
Less than | 5 < 6 |
| 10 | > |
Greater than | "qrs" > "abc" |
| 10 | <= |
Less than or Equal to | 3 <= 3 |
| 10 |
>= |
Greater than or Equal to | 4 >= 2 |
| 10 | == |
Equal | 9 == 9 |
| 10 | != |
Not Equal | "xyz != "abc" |
| 11 | ~ |
Match regular expr | "tmp.c" ~ /[a-z]+\.[ch]/ |
| 11 | !~ |
Not Match regular expr | "tmp.o" !~ /[a-z]+\.[ch]/ |
| 12 | in |
Array Membership | for (j in arr) print arr[j] |
| 13 | && |
Logical AND | X |
| 14 | || |
Logical OR | X |
| 15 | ?: |
Conditional Expression | x == -1 ? "error" : "OK" |
| 16 | = |
Assignment | x = 3 |
| 16 | ^= |
Exponentiation by value | x^=3 is equivalent to x = x^3 |
| 16 | *= |
Multiply by value | x*=y is equivalent to x = x*y |
| 16 | /= |
Divide by value | x/=y is equivalent to x = x/y |
| 16 | %= |
Modulo by value | x%=y is equivalent to x = x%y |
| 16 |
+= |
Increment by value | x+=y is equivalent to x = x+y |
| 16 | -= |
Decrement by value | x-=y is equivalent to x = x-y |
The following example prints the sum of all the first fields and the sum of all the second fields in the input file:
{ s1 += $1; s2 += $2 }
END { print s1,s2 }
The position of the increment and decrement operators affects their
interpretation.
The expression
i++
evaluates the current
contents of
i
and then increments
i.
The expression
++i
causes
awk
to increment
i
before evaluation.
For example:
$ echo "3 3" | awk '{
> print "$1 =", $1 "; $1++ =", $1++ "; new $1 =", $1
> print "$2 =", $2 "; ++$2 =", ++$2 "; new $2 =", $2
> }'
$1 = 3; $1++ = 3; new $1 = 4
$2 = 3; ++$2 = 4; new $2 = 4
2.9 Using Functions Within an Action
The
awk
language includes the built-in
mathematical functions listed in
Table 2-4.
Table 2-4: Built-In awk Mathematical Functions
| Function | Description |
atan2(x |
Returns the arctangent of the value specified
by
x/y. |
cos(expr) |
Returns the cosine of the value (in radians) specified by expr. |
) |
Returns the natural antilogarithm (base
e)
of
arg.
For example,
exp(0.693147)
returns
2.
See
log(arg). |
int(arg) |
Returns the integer part of arg. |
log(arg) |
Returns the natural logarithm (base
e)
of
arg.
For example,
log(2)
returns 0.693147.
See
exp(arg). |
rand |
Returns a pseudorandom number (0 <= n < 1). |
sin(arg) |
Returns the sine of the value (in radians) specified by arg. |
sqrt(arg) |
Returns the square root of arg. |
srand(seed) |
Uses
seed
as the
seed for a pseudorandom number sequence for subsequent calls to
rand.
If no seed is specified, the time of day is used.
The return
value is the previous seed. |
The
awk
language includes the built-in string functions
listed in
Table 2-5.
Table 2-5: Built-In awk String Functions
| Function | Description |
gsub(expr,s1,s2) |
Replaces every sequence of characters in
string
s2
that matches the regular expression
expr
with the string specified by
s1.
If
s2
is not supplied, the current input record
is used.
Regular expression
expr
is reevaluated
for each match.
This function returns a value representing the number of
replacements.
See also
sub(expr,s1,s2). |
index(s1,s2) |
Returns the character position in string s1 where string s2 occurs. If s2 is not in s1, this function returns a zero. |
length |
Returns the length in characters of the current record. |
length(arg) |
Returns the length in characters of the string
specified by
arg.
See
length. |
match(s,expr) |
Returns the character position in string
s
where a match is found for the regular expression
expr; sets the variable
RSTART
to the character
position at which the match begins and
RLENGTH
to a value
representing the length of the matched string.
If no match is found, this
function returns a zero. |
split(s,array,sep) |
Splits string
s
into consecutive elements of
array[1]...[n]
and returns the
number of elements.
The optional
sep
argument
specifies a field separator other than the one currently in force (the default
is the contents of the
FS
variable). |
sprintf(f,e1,e2
,...) |
Returns (but does not print) a string containing
the arguments
e1
and so on, formatted in the same
manner as by the
printf
command. |
sub(expr,s1,s2) |
Replaces the first sequence of characters
in string
s2
that matches the regular expression
expr
with the string specified by
s1.
If
s2
is not supplied, the current input record
is used.
This function returns a value representing the number of replacements
(0 or 1).
See also
gsub(expr,s1,s2). |
substr(s,m,n) |
Returns the substring of s that begins at character position m and is n characters long. The first character in s is at position 1. If n is omitted or if the string is not long enough to supply n characters, the rest of the string is returned. |
tolower(s) |
Translates all uppercase letters in string s to lowercase. If there is no argument, the function operates on the current record. |
toupper(s) |
Translates all lowercase letters in string s to uppercase. If there is no argument, the function operates on the current record. |
The
awk
language includes the built-in miscellaneous
functions listed in
Table 2-6.
Table 2-6: Built-In awk Miscellaneous Functions
| Function | Description |
close(arg) |
Closes the file or pipe named by
arg. |
system("command") |
Executes the system command specified and
returns its exit status.
The entire command must be enclosed in quotation
marks to prevent
awk
from attempting to interpret it as
one or more variable names. |
The
awk
language also lets you create functions by
using the following syntax:
functionname(parameter-list) {statements}
The word
func
can be used
in place of
function.
For functions that you create, the
left parenthesis both in the function's definition and in its use must immediately
follow the function name with no intervening space.
The names in the function
declaration's parameter list are the formal parameters for use within the
function.
When you call a function,
awk
replaces these
formal parameters with the values you supply in the calling statement.
Functions
can be recursive.
You can define local variables for a given function by declaring them as extra formal parameters; upon function entry, all local variables are initialized as empty strings or the number 0. To avoid visual confusion between real parameters and local variables, you can separate the local variables with extra spaces in the function declaration. For example:
function foo(in, out, local1, local2) {
local1 = "foo"
local2 = "bar"
.
.
.
}
2.10 Using Control Structures in awk
The
awk
language provides the control structures listed in
Table 2-7.
Except where noted, these structures work exactly as they do in the C language.
To perform several statements in a single control structure's action, enclose
the statements in braces.
If only a single statement is to be performed,
the braces are optional.
Each of the first two
if
structures
in the following example includes a single statement to be executed; these
structures are equivalent:
{
if (x == y) print
if (x == y) {
print
}
if (x == y) {
print $3
printf("Sum = %d\n", x+z)
}
}
Table 2-7: Control Structures in awk
| Structure | Description |
if-else |
The condition in parentheses in an
The order that "else" and "if" appear is important. As in:
if ( $1 == "abc" ) {
print("found abc\n");
}
else if ( $1 == "qrs" ) {
print("found qrs\n");
}
else if ( $1 == "xyz" ) {
print("found xyz\n");
}
else {
print("did not find "abc", "qrs", or "xyz"\n");
}
|
delete |
Array elements may be deleted using the delete statement. for example:
{
for(j in x)
delete x[j]
}
will remove all the elements of the array x. |
while |
The statements following the
{
i = 1
while(i<=NF) print $i++
}
|
for |
The
{
expr1
The previous
{
for(i=1;i<=NF;++i) print $i
}
The
$2=="="{name_value_pairs[$1]=$3}
end{
for (i in name_value_pairs)
print name_value_pairs[i]
}
|
break |
The
break
statement causes
an immediate exit from an enclosing
while
or
for
loop. |
| Comments | Include comments in an
{
print x,y # This is a comment
}
|
continue |
The
continue
statement
causes the next iteration of an enclosing loop to begin. |
getline |
The
By using
|
next |
The
next
statement causes
awk
to discard the current input record, read the next input record,
and begin scanning patterns from the start of the program file. |
exit |
The
exit
statement causes
the program to stop as if the end of the input occurred. |
2.11 Performing Actions Before or After Processing the Input
The
awk
program recognizes two special pattern keywords
that define the beginning
(BEGIN) and the end (END) of the input file.
BEGIN
matches the beginning
of the input before reading the first record.
Therefore,
awk
performs any actions associated with this pattern once, before processing
the input file.
For example, to change the field separator to a colon (:
BEGIN { FS = ":" }
This example action works the same as using the
-F:
flag on the command line.
Similarly,
END
matches the end of the input file
after processing the last record.
Therefore,
awk
performs
any actions associated with this pattern once, after processing the input
file.
For example, to print the total number of records in the input file,
include the following line in the program file:
END { print NR }
You concatenate strings by placing their variable
names together in an expression.
For example, the command
print $1 $2
prints a string consisting of the first two fields from the current
record, with no space between them.
You can use variables, numeric operators,
and functions when concatenating strings.
(See
Section 2.3.1
and
Section 2.8
for information on variables and numeric
operators.) The function
length($1 $2 $3)
returns
the length in characters of the first three fields.
(See
Section 2.9
for a list of the functions in
awk.) If the strings you
want to concatenate are field variables (see
Section 2.3.2),
you are not required to separate the names with white space; the expression
$1$2
is identical to
$1 $2.
2.13 Redirection and Pipes
Unless otherwise
specified,
print
and
printf
statements
write their output to the standard output file.
You can redirect the output
of any printing statement by using standard redirection operators.
For example:
print $0, $3, amt >> "reportfile"
This example appends its output to a file named
reportfile
instead of writing to the standard output.
(If
reportfile
does not exist before the first instance of redirection, it is
created.) The output file name in this example is enclosed in quotation marks.
The quotation marks are required to distinguish the file name from a variable
name.
You can mix writing to named files with writing to the standard output.
The
print
and
printf
statements
always send their output to stdout.
The following example sends output to
stderr:
print "oops: did not find expected input" | " cat 1>&2"
You also can pipe printed output through other commands.
The following
example pipes
awk's output through the
tr
command to convert all uppercase letters to lowercase letters:
print | "tr '[A-Z]' '[a-z]'"
As with redirection, the command to which you pipe the output must be
enclosed in quotation marks.
In
awk
you can redirect the
input to
getline
using standard redirection operators,
and you can supply the input to
getline
from a pipe.
For
example:
expr | getline
Here, expr is interpreted as a system command.
The following example reads the output from a system command:
BEGIN {
cmd = "ps aux"
while( cmd | getline > 0 ) {
if ( $2 == "PID" ) continue
unique_users[$1]++
}
close(cmd)
for(i in unique_users) {
printf("%3d %s\n", unique_users[i], i)
}
}
Only a limited number of files can be open for output.
The
awk
program uses your default open file descriptor limit.
For efficiency,
however, you can use the
close(arg)
statement to close files that you have opened for output and no
longer need.
For example:
{
if ( cur_file != "/tmp/" $1 ) {
close(cur_file)
cur_file = "/tmp/" $1
}
print $2 >cur_file
}
END { close(cur_file) }