1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
|
AWK(1P) POSIX Programmer's Manual AWK(1P)
PROLOG
This manual page is part of the POSIX Programmer's Man-
ual. The Linux implementation of this interface may
differ (consult the corresponding Linux manual page for
details of Linux behavior), or the interface may not be
implemented on Linux.
NAME
awk - pattern scanning and processing language
SYNOPSIS
awk [-F ERE][-v assignment] ... program [argument ...]
awk [-F ERE] -f progfile ... [-v assignment] ...[argu-
ment ...]
DESCRIPTION
The awk utility shall execute programs written in the
awk programming language, which is specialized for tex-
tual data manipulation. An awk program is a sequence of
patterns and corresponding actions. When input is read
that matches a pattern, the action associated with that
pattern is carried out.
Input shall be interpreted as a sequence of records. By
default, a record is a line, less its terminating <new-
line>, but this can be changed by using the RS built-in
variable. Each record of input shall be matched in turn
against each pattern in the program. For each pattern
matched, the associated action shall be executed.
The awk utility shall interpret each input record as a
sequence of fields where, by default, a field is a
string of non- <blank>s. This default white-space field
delimiter can be changed by using the FS built-in vari-
able or -F ERE. The awk utility shall denote the first
field in a record $1, the second $2, and so on. The sym-
bol $0 shall refer to the entire record; setting any
other field causes the re-evaluation of $0. Assigning to
$0 shall reset the values of all other fields and the NF
built-in variable.
OPTIONS
The awk utility shall conform to the Base Definitions
volume of IEEE Std 1003.1-2001, Section 12.2, Utility
Syntax Guidelines.
The following options shall be supported:
-F ERE
Define the input field separator to be the
extended regular expression ERE, before any input
is read; see Regular Expressions .
-f progfile
Specify the pathname of the file progfile con-
taining an awk program. If multiple instances of
this option are specified, the concatenation of
the files specified as progfile in the order
specified shall be the awk program. The awk pro-
gram can alternatively be specified in the com-
mand line as a single argument.
-v assignment
The application shall ensure that the assignment
argument is in the same form as an assignment
operand. The specified variable assignment shall
occur prior to executing the awk program, includ-
ing the actions associated with BEGIN patterns
(if any). Multiple occurrences of this option can
be specified.
OPERANDS
The following operands shall be supported:
program
If no -f option is specified, the first operand
to awk shall be the text of the awk program. The
application shall supply the program operand as a
single argument to awk. If the text does not end
in a <newline>, awk shall interpret the text as
if it did.
argument
Either of the following two types of argument can
be intermixed:
file
A pathname of a file that contains the input to
be read, which is matched against the set of pat-
terns in the program. If no file operands are
specified, or if a file operand is '-', the stan-
dard input shall be used.
assignment
An operand that begins with an underscore or
alphabetic character from the portable character
set (see the table in the Base Definitions volume
of IEEE Std 1003.1-2001, Section 6.1, Portable
Character Set), followed by a sequence of under-
scores, digits, and alphabetics from the portable
character set, followed by the '=' character,
shall specify a variable assignment rather than a
pathname. The characters before the '=' represent
the name of an awk variable; if that name is an
awk reserved word (see Grammar ) the behavior is
undefined. The characters following the equal
sign shall be interpreted as if they appeared in
the awk program preceded and followed by a dou-
ble-quote ( ' )' character, as a STRING token
(see Grammar ), except that if the last character
is an unescaped backslash, it shall be inter-
preted as a literal backslash rather than as the
first character of the sequence "\"" . The vari-
able shall be assigned the value of that STRING
token and, if appropriate, shall be considered a
numeric string (see Expressions in awk ), the
variable shall also be assigned its numeric
value. Each such variable assignment shall occur
just prior to the processing of the following
file, if any. Thus, an assignment before the
first file argument shall be executed after the
BEGIN actions (if any), while an assignment after
the last file argument shall occur before the END
actions (if any). If there are no file arguments,
assignments shall be executed before processing
the standard input.
STDIN
The standard input shall be used only if no file
operands are specified, or if a file operand is '-' ;
see the INPUT FILES section. If the awk program contains
no actions and no patterns, but is otherwise a valid awk
program, standard input and any file operands shall not
be read and awk shall exit with a return status of zero.
INPUT FILES
Input files to the awk program from any of the following
sources shall be text files:
* Any file operands or their equivalents, achieved by
modifying the awk variables ARGV and ARGC
* Standard input in the absence of any file operands
* Arguments to the getline function
Whether the variable RS is set to a value other than a
<newline> or not, for these files, implementations shall
support records terminated with the specified separator
up to {LINE_MAX} bytes and may support longer records.
If -f progfile is specified, the application shall
ensure that the files named by each of the progfile
option-arguments are text files and their concatenation,
in the same order as they appear in the arguments, is an
awk program.
ENVIRONMENT VARIABLES
The following environment variables shall affect the
execution of awk:
LANG Provide a default value for the internationaliza-
tion variables that are unset or null. (See the
Base Definitions volume of IEEE Std 1003.1-2001,
Section 8.2, Internationalization Variables for
the precedence of internationalization variables
used to determine the values of locale cate-
gories.)
LC_ALL If set to a non-empty string value, override the
values of all the other internationalization
variables.
LC_COLLATE
Determine the locale for the behavior of ranges,
equivalence classes, and multi-character collat-
ing elements within regular expressions and in
comparisons of string values.
LC_CTYPE
Determine the locale for the interpretation of
sequences of bytes of text data as characters
(for example, single-byte as opposed to multi-
byte characters in arguments and input files),
the behavior of character classes within regular
expressions, the identification of characters as
letters, and the mapping of uppercase and lower-
case characters for the toupper and tolower func-
tions.
LC_MESSAGES
Determine the locale that should be used to
affect the format and contents of diagnostic mes-
sages written to standard error.
LC_NUMERIC
Determine the radix character used when inter-
preting numeric input, performing conversions
between numeric and string values, and formatting
numeric output. Regardless of locale, the period
character (the decimal-point character of the
POSIX locale) is the decimal-point character rec-
ognized in processing awk programs (including
assignments in command line arguments).
NLSPATH
Determine the location of message catalogs for
the processing of LC_MESSAGES .
PATH Determine the search path when looking for com-
mands executed by system(expr), or input and out-
put pipes; see the Base Definitions volume of
IEEE Std 1003.1-2001, Chapter 8, Environment
Variables.
In addition, all environment variables shall be visible
via the awk variable ENVIRON.
ASYNCHRONOUS EVENTS
Default.
STDOUT
The nature of the output files depends on the awk pro-
gram.
STDERR
The standard error shall be used only for diagnostic
messages.
OUTPUT FILES
The nature of the output files depends on the awk pro-
gram.
EXTENDED DESCRIPTION
Overall Program Structure
An awk program is composed of pairs of the form:
pattern { action }
Either the pattern or the action (including the enclos-
ing brace characters) can be omitted.
A missing pattern shall match any record of input, and a
missing action shall be equivalent to:
{ print }
Execution of the awk program shall start by first exe-
cuting the actions associated with all BEGIN patterns in
the order they occur in the program. Then each file
operand (or standard input if no files were specified)
shall be processed in turn by reading data from the file
until a record separator is seen ( <newline> by
default). Before the first reference to a field in the
record is evaluated, the record shall be split into
fields, according to the rules in Regular Expressions,
using the value of FS that was current at the time the
record was read. Each pattern in the program then shall
be evaluated in the order of occurrence, and the action
associated with each pattern that matches the current
record executed. The action for a matching pattern shall
be executed before evaluating subsequent patterns.
Finally, the actions associated with all END patterns
shall be executed in the order they occur in the pro-
gram.
Expressions in awk
Expressions describe computations used in patterns and
actions. In the following table, valid expression
operations are given in groups from highest precedence
first to lowest precedence last, with equal-precedence
operators grouped between horizontal lines. In expres-
sion evaluation, where the grammar is formally ambigu-
ous, higher precedence operators shall be evaluated
before lower precedence operators. In this table expr,
expr1, expr2, and expr3 represent any expression, while
lvalue represents any entity that can be assigned to
(that is, on the left side of an assignment operator).
The precise syntax of expressions is given in Grammar .
Table: Expressions in Decreasing Precedence in awk
IEEE/The Open Group 2003 AWK(1P)
|