aboutsummaryrefslogtreecommitdiff
path: root/coreutils-5.3.0-bin/man/cat1/gawk.1.txt
diff options
context:
space:
mode:
Diffstat (limited to 'coreutils-5.3.0-bin/man/cat1/gawk.1.txt')
-rw-r--r--coreutils-5.3.0-bin/man/cat1/gawk.1.txt1972
1 files changed, 1972 insertions, 0 deletions
diff --git a/coreutils-5.3.0-bin/man/cat1/gawk.1.txt b/coreutils-5.3.0-bin/man/cat1/gawk.1.txt
new file mode 100644
index 0000000..a431e7b
--- /dev/null
+++ b/coreutils-5.3.0-bin/man/cat1/gawk.1.txt
@@ -0,0 +1,1972 @@
+GAWK(1) Utility Commands GAWK(1)
+
+
+
+NAME
+ gawk - pattern scanning and processing language
+
+SYNOPSIS
+ gawk [ POSIX or GNU style options ] -f program-file [ --
+ ] file ...
+ gawk [ POSIX or GNU style options ] [ -- ] program-text
+ file ...
+
+ pgawk [ POSIX or GNU style options ] -f program-file [
+ -- ] file ...
+ pgawk [ POSIX or GNU style options ] [ -- ] program-text
+ file ...
+
+DESCRIPTION
+ Gawk is the GNU Project's implementation of the AWK pro-
+ gramming language. It conforms to the definition of the
+ language in the POSIX 1003.1 Standard. This version in
+ turn is based on the description in The AWK Programming
+ Language, by Aho, Kernighan, and Weinberger, with the
+ additional features found in the System V Release 4 ver-
+ sion of UNIX awk. Gawk also provides more recent Bell
+ Laboratories awk extensions, and a number of GNU-spe-
+ cific extensions.
+
+ Pgawk is the profiling version of gawk. It is identical
+ in every way to gawk, except that programs run more
+ slowly, and it automatically produces an execution pro-
+ file in the file awkprof.out when done. See the --pro-
+ file option, below.
+
+ The command line consists of options to gawk itself, the
+ AWK program text (if not supplied via the -f or --file
+ options), and values to be made available in the ARGC
+ and ARGV pre-defined AWK variables.
+
+OPTION FORMAT
+ Gawk options may be either traditional POSIX one letter
+ options, or GNU-style long options. POSIX options start
+ with a single "-", while long options start with "--".
+ Long options are provided for both GNU-specific features
+ and for POSIX-mandated features.
+
+ Following the POSIX standard, gawk-specific options are
+ supplied via arguments to the -W option. Multiple -W
+ options may be supplied Each -W option has a correspond-
+ ing long option, as detailed below. Arguments to long
+ options are either joined with the option by an = sign,
+ with no intervening spaces, or they may be provided in
+ the next command line argument. Long options may be
+ abbreviated, as long as the abbreviation remains unique.
+
+OPTIONS
+ Gawk accepts the following options, listed by frequency.
+
+ -F fs
+ --field-separator fs
+ Use fs for the input field separator (the value
+ of the FS predefined variable).
+
+ -v var=val
+ --assign var=val
+ Assign the value val to the variable var, before
+ execution of the program begins. Such variable
+ values are available to the BEGIN block of an AWK
+ program.
+
+ -f program-file
+ --file program-file
+ Read the AWK program source from the file pro-
+ gram-file, instead of from the first command line
+ argument. Multiple -f (or --file) options may be
+ used.
+
+ -mf NNN
+ -mr NNN
+ Set various memory limits to the value NNN. The
+ f flag sets the maximum number of fields, and the
+ r flag sets the maximum record size. These two
+ flags and the -m option are from an earlier ver-
+ sion of the Bell Laboratories research version of
+ UNIX awk. They are ignored by gawk, since gawk
+ has no pre-defined limits.
+
+ -W compat
+ -W traditional
+ --compat
+ --traditional
+ Run in compatibility mode. In compatibility
+ mode, gawk behaves identically to UNIX awk; none
+ of the GNU-specific extensions are recognized.
+ The use of --traditional is preferred over the
+ other forms of this option. See GNU EXTENSIONS,
+ below, for more information.
+
+ -W copyleft
+ -W copyright
+ --copyleft
+ --copyright
+ Print the short version of the GNU copyright
+ information message on the standard output and
+ exit successfully.
+
+ -W dump-variables[=file]
+ --dump-variables[=file]
+ Print a sorted list of global variables, their
+ types and final values to file. If no file is
+ provided, gawk uses a file named awkvars.out in
+ the current directory.
+ Having a list of all the global variables is a
+ good way to look for typographical errors in your
+ programs. You would also use this option if you
+ have a large program with a lot of functions, and
+ you want to be sure that your functions don't
+ inadvertently use global variables that you meant
+ to be local. (This is a particularly easy mis-
+ take to make with simple variable names like i,
+ j, and so on.)
+
+ -W exec file
+ --exec file
+ Similar to -f, however, this is option is the
+ last one processed. This should be used with #!
+ scripts, particularly for CGI applications, to
+ avoid passing in options or source code (!) on
+ the command line from a URL. This option dis-
+ ables command-line variable assignments.
+
+ -W gen-po
+ --gen-po
+ Scan and parse the AWK program, and generate a
+ GNU .po format file on standard output with
+ entries for all localizable strings in the pro-
+ gram. The program itself is not executed. See
+ the GNU gettext distribution for more information
+ on .po files.
+
+ -W help
+ -W usage
+ --help
+ --usage
+ Print a relatively short summary of the available
+ options on the standard output. (Per the GNU
+ Coding Standards, these options cause an immedi-
+ ate, successful exit.)
+
+ -W lint[=value]
+ --lint[=value]
+ Provide warnings about constructs that are dubi-
+ ous or non-portable to other AWK implementations.
+ With an optional argument of fatal, lint warnings
+ become fatal errors. This may be drastic, but
+ its use will certainly encourage the development
+ of cleaner AWK programs. With an optional argu-
+ ment of invalid, only warnings about things that
+ are actually invalid are issued. (This is not
+ fully implemented yet.)
+
+ -W lint-old
+ --lint-old
+ Provide warnings about constructs that are not
+ portable to the original version of Unix awk.
+
+ -W non-decimal-data
+ --non-decimal-data
+ Recognize octal and hexadecimal values in input
+ data. Use this option with great caution!
+
+ -W posix
+ --posix
+ This turns on compatibility mode, with the fol-
+ lowing additional restrictions:
+
+ · \x escape sequences are not recognized.
+
+ · Only space and tab act as field separators when
+ FS is set to a single space, newline does not.
+
+ · You cannot continue lines after ? and :.
+
+ · The synonym func for the keyword function is
+ not recognized.
+
+ · The operators ** and **= cannot be used in
+ place of ^ and ^=.
+
+ · The fflush() function is not available.
+
+ -W profile[=prof_file]
+ --profile[=prof_file]
+ Send profiling data to prof_file. The default is
+ awkprof.out. When run with gawk, the profile is
+ just a "pretty printed" version of the program.
+ When run with pgawk, the profile contains execu-
+ tion counts of each statement in the program in
+ the left margin and function call counts for each
+ user-defined function.
+
+ -W re-interval
+ --re-interval
+ Enable the use of interval expressions in regular
+ expression matching (see Regular Expressions,
+ below). Interval expressions were not tradition-
+ ally available in the AWK language. The POSIX
+ standard added them, to make awk and egrep con-
+ sistent with each other. However, their use is
+ likely to break old AWK programs, so gawk only
+ provides them if they are requested with this
+ option, or when --posix is specified.
+
+ -W source program-text
+ --source program-text
+ Use program-text as AWK program source code.
+ This option allows the easy intermixing of
+ library functions (used via the -f and --file
+ options) with source code entered on the command
+ line. It is intended primarily for medium to
+ large AWK programs used in shell scripts.
+
+ -W use-lc-numeric
+ --use-lc-numeric
+ This forces gawk to use the locale's decimal
+ point character when parsing input data.
+ Although the POSIX standard requires this behav-
+ ior, and gawk does so when --posix is in effect,
+ the default is to follow traditional behavior and
+ use a period as the decimal point, even in
+ locales where the period is not the decimal point
+ character. This option overrides the default
+ behavior, without the full draconian strictness
+ of the --posix option.
+
+ -W version
+ --version
+ Print version information for this particular
+ copy of gawk on the standard output. This is
+ useful mainly for knowing if the current copy of
+ gawk on your system is up to date with respect to
+ whatever the Free Software Foundation is dis-
+ tributing. This is also useful when reporting
+ bugs. (Per the GNU Coding Standards, these
+ options cause an immediate, successful exit.)
+
+ -- Signal the end of options. This is useful to
+ allow further arguments to the AWK program itself
+ to start with a "-". This provides consistency
+ with the argument parsing convention used by most
+ other POSIX programs.
+ In compatibility mode, any other options are flagged as
+ invalid, but are otherwise ignored. In normal opera-
+ tion, as long as program text has been supplied, unknown
+ options are passed on to the AWK program in the ARGV
+ array for processing. This is particularly useful for
+ running AWK programs via the "#!" executable interpreter
+ mechanism.
+AWK PROGRAM EXECUTION
+ An AWK program consists of a sequence of pattern-action
+ statements and optional function definitions.
+ pattern { action statements }
+ function name(parameter list) { statements }
+ Gawk first reads the program source from the program-
+ file(s) if specified, from arguments to --source, or
+ from the first non-option argument on the command line.
+ The -f and --source options may be used multiple times
+ on the command line. Gawk reads the program text as if
+ all the program-files and command line source texts had
+ been concatenated together. This is useful for building
+ libraries of AWK functions, without having to include
+ them in each new AWK program that uses them. It also
+ provides the ability to mix library functions with com-
+ mand line programs.
+ The environment variable AWKPATH specifies a search path
+ to use when finding source files named with the -f
+ option. If this variable does not exist, the default
+ path is ".:/usr/local/share/awk". (The actual directory
+ may vary, depending upon how gawk was built and
+ installed.) If a file name given to the -f option con-
+ tains a "/" character, no path search is performed.
+ Gawk executes AWK programs in the following order.
+ First, all variable assignments specified via the -v
+ option are performed. Next, gawk compiles the program
+ into an internal form. Then, gawk executes the code in
+ the BEGIN block(s) (if any), and then proceeds to read
+ each file named in the ARGV array. If there are no
+ files named on the command line, gawk reads the standard
+ input.
+ If a filename on the command line has the form var=val
+ it is treated as a variable assignment. The variable
+ var will be assigned the value val. (This happens after
+ any BEGIN block(s) have been run.) Command line vari-
+ able assignment is most useful for dynamically assigning
+ values to the variables AWK uses to control how input is
+ broken into fields and records. It is also useful for
+ controlling state if multiple passes are needed over a
+ single data file.
+ If the value of a particular element of ARGV is empty
+ (""), gawk skips over it.
+ For each record in the input, gawk tests to see if it
+ matches any pattern in the AWK program. For each pat-
+ tern that the record matches, the associated action is
+ executed. The patterns are tested in the order they
+ occur in the program.
+ Finally, after all the input is exhausted, gawk executes
+ the code in the END block(s) (if any).
+VARIABLES, RECORDS AND FIELDS
+ AWK variables are dynamic; they come into existence when
+ they are first used. Their values are either floating-
+ point numbers or strings, or both, depending upon how
+ they are used. AWK also has one dimensional arrays;
+ arrays with multiple dimensions may be simulated. Sev-
+ eral pre-defined variables are set as a program runs;
+ these are described as needed and summarized below.
+ Records
+ Normally, records are separated by newline characters.
+ You can control how records are separated by assigning
+ values to the built-in variable RS. If RS is any single
+ character, that character separates records. Otherwise,
+ RS is a regular expression. Text in the input that
+ matches this regular expression separates the record.
+ However, in compatibility mode, only the first character
+ of its string value is used for separating records. If
+ RS is set to the null string, then records are separated
+ by blank lines. When RS is set to the null string, the
+ newline character always acts as a field separator, in
+ addition to whatever value FS may have.
+ Fields
+ As each input record is read, gawk splits the record
+ into fields, using the value of the FS variable as the
+ field separator. If FS is a single character, fields
+ are separated by that character. If FS is the null
+ string, then each individual character becomes a sepa-
+ rate field. Otherwise, FS is expected to be a full reg-
+ ular expression. In the special case that FS is a sin-
+ gle space, fields are separated by runs of spaces and/or
+ tabs and/or newlines. (But see the section POSIX COM-
+ PATIBILITY, below). NOTE: The value of IGNORECASE (see
+ below) also affects how fields are split when FS is a
+ regular expression, and how records are separated when
+ RS is a regular expression.
+ If the FIELDWIDTHS variable is set to a space separated
+ list of numbers, each field is expected to have fixed
+ width, and gawk splits up the record using the specified
+ widths. The value of FS is ignored. Assigning a new
+ value to FS overrides the use of FIELDWIDTHS, and
+ restores the default behavior.
+ Each field in the input record may be referenced by its
+ position, $1, $2, and so on. $0 is the whole record.
+ Fields need not be referenced by constants:
+ n = 5
+ print $n
+ prints the fifth field in the input record.
+ The variable NF is set to the total number of fields in
+ the input record.
+ References to non-existent fields (i.e. fields after
+ $NF) produce the null-string. However, assigning to a
+ non-existent field (e.g., $(NF+2) = 5) increases the
+ value of NF, creates any intervening fields with the
+ null string as their value, and causes the value of $0
+ to be recomputed, with the fields being separated by the
+ value of OFS. References to negative numbered fields
+ cause a fatal error. Decrementing NF causes the values
+ of fields past the new value to be lost, and the value
+ of $0 to be recomputed, with the fields being separated
+ by the value of OFS.
+ Assigning a value to an existing field causes the whole
+ record to be rebuilt when $0 is referenced. Similarly,
+ assigning a value to $0 causes the record to be resplit,
+ creating new values for the fields.
+ Built-in Variables
+ Gawk's built-in variables are:
+ ARGC The number of command line arguments (does
+ not include options to gawk, or the program
+ source).
+ ARGIND The index in ARGV of the current file being
+ processed.
+ ARGV Array of command line arguments. The array
+ is indexed from 0 to ARGC - 1. Dynamically
+ changing the contents of ARGV can control
+ the files used for data.
+ BINMODE On non-POSIX systems, specifies use of
+ "binary" mode for all file I/O. Numeric
+ values of 1, 2, or 3, specify that input
+ files, output files, or all files, respec-
+ tively, should use binary I/O. String val-
+ ues of "r", or "w" specify that input files,
+ or output files, respectively, should use
+ binary I/O. String values of "rw" or "wr"
+ specify that all files should use binary
+ I/O. Any other string value is treated as
+ "rw", but generates a warning message.
+ CONVFMT The conversion format for numbers, "%.6g",
+ by default.
+ ENVIRON An array containing the values of the cur-
+ rent environment. The array is indexed by
+ the environment variables, each element
+ being the value of that variable (e.g., ENV-
+ IRON["HOME"] might be /home/arnold). Chang-
+ ing this array does not affect the environ-
+ ment seen by programs which gawk spawns via
+ redirection or the system() function.
+ ERRNO If a system error occurs either doing a
+ redirection for getline, during a read for
+ getline, or during a close(), then ERRNO
+ will contain a string describing the error.
+ The value is subject to translation in non-
+ English locales.
+ FIELDWIDTHS A white-space separated list of fieldwidths.
+ When set, gawk parses the input into fields
+ of fixed width, instead of using the value
+ of the FS variable as the field separator.
+ FILENAME The name of the current input file. If no
+ files are specified on the command line, the
+ value of FILENAME is "-". However, FILENAME
+ is undefined inside the BEGIN block (unless
+ set by getline).
+ FNR The input record number in the current input
+ file.
+ FS The input field separator, a space by
+ default. See Fields, above.
+ IGNORECASE Controls the case-sensitivity of all regular
+ expression and string operations. If
+ IGNORECASE has a non-zero value, then string
+ comparisons and pattern matching in rules,
+ field splitting with FS, record separating
+ with RS, regular expression matching with ~
+ and !~, and the gensub(), gsub(), index(),
+ match(), split(), and sub() built-in func-
+ tions all ignore case when doing regular
+ expression operations. NOTE: Array sub-
+ scripting is not affected. However, the
+ asort() and asorti() functions are affected.
+ Thus, if IGNORECASE is not equal to zero,
+ /aB/ matches all of the strings "ab", "aB",
+ "Ab", and "AB". As with all AWK variables,
+ the initial value of IGNORECASE is zero, so
+ all regular expression and string operations
+ are normally case-sensitive. Under Unix,
+ the full ISO 8859-1 Latin-1 character set is
+ used when ignoring case. As of gawk 3.1.4,
+ the case equivalencies are fully locale-
+ aware, based on the C <ctype.h> facilities
+ such as isalpha(), and toupper().
+ LINT Provides dynamic control of the --lint
+ option from within an AWK program. When
+ true, gawk prints lint warnings. When false,
+ it does not. When assigned the string value
+ "fatal", lint warnings become fatal errors,
+ exactly like --lint=fatal. Any other true
+ value just prints warnings.
+ NF The number of fields in the current input
+ record.
+ NR The total number of input records seen so
+ far.
+ OFMT The output format for numbers, "%.6g", by
+ default.
+ OFS The output field separator, a space by
+ default.
+ ORS The output record separator, by default a
+ newline.
+ PROCINFO The elements of this array provide access to
+ information about the running AWK program.
+ On some systems, there may be elements in
+ the array, "group1" through "groupn" for
+ some n, which is the number of supplementary
+ groups that the process has. Use the in
+ operator to test for these elements. The
+ following elements are guaranteed to be
+ available:
+ PROCINFO["egid"] the value of the gete-
+ gid(2) system call.
+ PROCINFO["euid"] the value of the
+ geteuid(2) system call.
+ PROCINFO["FS"] "FS" if field splitting
+ with FS is in effect, or
+ "FIELDWIDTHS" if field
+ splitting with FIELD-
+ WIDTHS is in effect.
+ PROCINFO["gid"] the value of the get-
+ gid(2) system call.
+ PROCINFO["pgrpid"] the process group ID of
+ the current process.
+ PROCINFO["pid"] the process ID of the
+ current process.
+ PROCINFO["ppid"] the parent process ID of
+ the current process.
+ PROCINFO["uid"] the value of the
+ getuid(2) system call.
+ PROCINFO["version"]
+ The version of gawk.
+ This is available from
+ version 3.1.4 and later.
+ RS The input record separator, by default a
+ newline.
+ RT The record terminator. Gawk sets RT to the
+ input text that matched the character or
+ regular expression specified by RS.
+ RSTART The index of the first character matched by
+ match(); 0 if no match. (This implies that
+ character indices start at one.)
+ RLENGTH The length of the string matched by match();
+ -1 if no match.
+ SUBSEP The character used to separate multiple sub-
+ scripts in array elements, by default
+ "\034".
+ TEXTDOMAIN The text domain of the AWK program; used to
+ find the localized translations for the pro-
+ gram's strings.
+ Arrays
+ Arrays are subscripted with an expression between square
+ brackets ([ and ]). If the expression is an expression
+ list (expr, expr ...) then the array subscript is a
+ string consisting of the concatenation of the (string)
+ value of each expression, separated by the value of the
+ SUBSEP variable. This facility is used to simulate mul-
+ tiply dimensioned arrays. For example:
+ i = "A"; j = "B"; k = "C"
+ x[i, j, k] = "hello, world\n"
+ assigns the string "hello, world\n" to the element of
+ the array x which is indexed by the string
+ "A\034B\034C". All arrays in AWK are associative, i.e.
+ indexed by string values.
+ The special operator in may be used to test if an array
+ has an index consisting of a particular value.
+ if (val in array)
+ print array[val]
+ If the array has multiple subscripts, use (i, j) in
+ array.
+ The in construct may also be used in a for loop to iter-
+ ate over all the elements of an array.
+ An element may be deleted from an array using the delete
+ statement. The delete statement may also be used to
+ delete the entire contents of an array, just by specify-
+ ing the array name without a subscript.
+ Variable Typing And Conversion
+ Variables and fields may be (floating point) numbers, or
+ strings, or both. How the value of a variable is inter-
+ preted depends upon its context. If used in a numeric
+ expression, it will be treated as a number; if used as a
+ string it will be treated as a string.
+ To force a variable to be treated as a number, add 0 to
+ it; to force it to be treated as a string, concatenate
+ it with the null string.
+ When a string must be converted to a number, the conver-
+ sion is accomplished using strtod(3). A number is con-
+ verted to a string by using the value of CONVFMT as a
+ format string for sprintf(3), with the numeric value of
+ the variable as the argument. However, even though all
+ numbers in AWK are floating-point, integral values are
+ always converted as integers. Thus, given
+ CONVFMT = "%2.2f"
+ a = 12
+ b = a ""
+ the variable b has a string value of "12" and not
+ "12.00".
+ When operating in POSIX mode (such as with the --posix
+ command line option), beware that locale settings may
+ interfere with the way decimal numbers are treated: the
+ decimal separator of the numbers you are feeding to gawk
+ must conform to what your locale would expect, be it a
+ comma (,) or a period (.).
+ Gawk performs comparisons as follows: If two variables
+ are numeric, they are compared numerically. If one
+ value is numeric and the other has a string value that
+ is a "numeric string," then comparisons are also done
+ numerically. Otherwise, the numeric value is converted
+ to a string and a string comparison is performed. Two
+ strings are compared, of course, as strings.
+ Note that string constants, such as "57", are not
+ numeric strings, they are string constants. The idea of
+ "numeric string" only applies to fields, getline input,
+ FILENAME, ARGV elements, ENVIRON elements and the ele-
+ ments of an array created by split() that are numeric
+ strings. The basic idea is that user input, and only
+ user input, that looks numeric, should be treated that
+ way.
+ Uninitialized variables have the numeric value 0 and the
+ string value "" (the null, or empty, string).
+ Octal and Hexadecimal Constants
+ Starting with version 3.1 of gawk , you may use C-style
+ octal and hexadecimal constants in your AWK program
+ source code. For example, the octal value 011 is equal
+ to decimal 9, and the hexadecimal value 0x11 is equal to
+ decimal 17.
+ String Constants
+ String constants in AWK are sequences of characters
+ enclosed between double quotes ("). Within strings,
+ certain escape sequences are recognized, as in C. These
+ are:
+ \\ A literal backslash.
+ \a The "alert" character; usually the ASCII BEL char-
+ acter.
+ \b backspace.
+ \f form-feed.
+ \n newline.
+ \r carriage return.
+ \t horizontal tab.
+ \v vertical tab.
+ \xhex digits
+ The character represented by the string of hexadec-
+ imal digits following the \x. As in ANSI C, all
+ following hexadecimal digits are considered part of
+ the escape sequence. (This feature should tell us
+ something about language design by committee.)
+ E.g., "\x1B" is the ASCII ESC (escape) character.
+ \ddd The character represented by the 1-, 2-, or 3-digit
+ sequence of octal digits. E.g., "\033" is the
+ ASCII ESC (escape) character.
+ \c The literal character c.
+ The escape sequences may also be used inside constant
+ regular expressions (e.g., /[ \t\f\n\r\v]/ matches
+ whitespace characters).
+ In compatibility mode, the characters represented by
+ octal and hexadecimal escape sequences are treated lit-
+ erally when used in regular expression constants. Thus,
+ /a\52b/ is equivalent to /a\*b/.
+PATTERNS AND ACTIONS
+ AWK is a line-oriented language. The pattern comes
+ first, and then the action. Action statements are
+ enclosed in { and }. Either the pattern may be missing,
+ or the action may be missing, but, of course, not both.
+ If the pattern is missing, the action is executed for
+ every single record of input. A missing action is
+ equivalent to
+ { print }
+ which prints the entire record.
+ Comments begin with the "#" character, and continue
+ until the end of the line. Blank lines may be used to
+ separate statements. Normally, a statement ends with a
+ newline, however, this is not the case for lines ending
+ in a ",", {, ?, :, &&, or ||. Lines ending in do or
+ else also have their statements automatically continued
+ on the following line. In other cases, a line can be
+ continued by ending it with a "\", in which case the
+ newline will be ignored.
+ Multiple statements may be put on one line by separating
+ them with a ";". This applies to both the statements
+ within the action part of a pattern-action pair (the
+ usual case), and to the pattern-action statements them-
+ selves.
+ Patterns
+ AWK patterns may be one of the following:
+ BEGIN
+ END
+ /regular expression/
+ relational expression
+ pattern && pattern
+ pattern || pattern
+ pattern ? pattern : pattern
+ (pattern)
+ ! pattern
+ pattern1, pattern2
+ BEGIN and END are two special kinds of patterns which
+ are not tested against the input. The action parts of
+ all BEGIN patterns are merged as if all the statements
+ had been written in a single BEGIN block. They are exe-
+ cuted before any of the input is read. Similarly, all
+ the END blocks are merged, and executed when all the
+ input is exhausted (or when an exit statement is exe-
+ cuted). BEGIN and END patterns cannot be combined with
+ other patterns in pattern expressions. BEGIN and END
+ patterns cannot have missing action parts.
+ For /regular expression/ patterns, the associated state-
+ ment is executed for each input record that matches the
+ regular expression. Regular expressions are the same as
+ those in egrep(1), and are summarized below.
+ A relational expression may use any of the operators
+ defined below in the section on actions. These gener-
+ ally test whether certain fields match certain regular
+ expressions.
+ The &&, ||, and ! operators are logical AND, logical
+ OR, and logical NOT, respectively, as in C. They do
+ short-circuit evaluation, also as in C, and are used for
+ combining more primitive pattern expressions. As in
+ most languages, parentheses may be used to change the
+ order of evaluation.
+ The ?: operator is like the same operator in C. If the
+ first pattern is true then the pattern used for testing
+ is the second pattern, otherwise it is the third. Only
+ one of the second and third patterns is evaluated.
+ The pattern1, pattern2 form of an expression is called a
+ range pattern. It matches all input records starting
+ with a record that matches pattern1, and continuing
+ until a record that matches pattern2, inclusive. It
+ does not combine with any other sort of pattern expres-
+ sion.
+ Regular Expressions
+ Regular expressions are the extended kind found in
+ egrep. They are composed of characters as follows:
+ c matches the non-metacharacter c.
+ \c matches the literal character c.
+ . matches any character including newline.
+ ^ matches the beginning of a string.
+ $ matches the end of a string.
+ [abc...] character list, matches any of the characters
+ abc....
+ [^abc...] negated character list, matches any character
+ except abc....
+ r1|r2 alternation: matches either r1 or r2.
+ r1r2 concatenation: matches r1, and then r2.
+ r+ matches one or more r's.
+ r* matches zero or more r's.
+ r? matches zero or one r's.
+ (r) grouping: matches r.
+ r{n}
+ r{n,}
+ r{n,m} One or two numbers inside braces denote an
+ interval expression. If there is one number
+ in the braces, the preceding regular expres-
+ sion r is repeated n times. If there are two
+ numbers separated by a comma, r is repeated n
+ to m times. If there is one number followed
+ by a comma, then r is repeated at least n
+ times.
+ Interval expressions are only available if
+ either --posix or --re-interval is specified
+ on the command line.
+
+ \y matches the empty string at either the begin-
+ ning or the end of a word.
+
+ \B matches the empty string within a word.
+
+ \< matches the empty string at the beginning of
+ a word.
+
+ \> matches the empty string at the end of a
+ word.
+
+ \w matches any word-constituent character (let-
+ ter, digit, or underscore).
+
+ \W matches any character that is not word-con-
+ stituent.
+
+ \` matches the empty string at the beginning of
+ a buffer (string).
+
+ \' matches the empty string at the end of a
+ buffer.
+
+ The escape sequences that are valid in string constants
+ (see below) are also valid in regular expressions.
+
+ Character classes are a feature introduced in the POSIX
+ standard. A character class is a special notation for
+ describing lists of characters that have a specific
+ attribute, but where the actual characters themselves
+ can vary from country to country and/or from character
+ set to character set. For example, the notion of what
+ is an alphabetic character differs in the USA and in
+ France.
+
+ A character class is only valid in a regular expression
+ inside the brackets of a character list. Character
+ classes consist of [:, a keyword denoting the class, and
+ :]. The character classes defined by the POSIX standard
+ are:
+
+ [:alnum:] Alphanumeric characters.
+
+ [:alpha:] Alphabetic characters.
+
+ [:blank:] Space or tab characters.
+
+ [:cntrl:] Control characters.
+
+ [:digit:] Numeric characters.
+
+ [:graph:] Characters that are both printable and visi-
+ ble. (A space is printable, but not visible,
+ while an a is both.)
+
+ [:lower:] Lower-case alphabetic characters.
+
+ [:print:] Printable characters (characters that are not
+ control characters.)
+
+ [:punct:] Punctuation characters (characters that are
+ not letter, digits, control characters, or
+ space characters).
+
+ [:space:] Space characters (such as space, tab, and
+ formfeed, to name a few).
+
+ [:upper:] Upper-case alphabetic characters.
+
+ [:xdigit:] Characters that are hexadecimal digits.
+
+ For example, before the POSIX standard, to match
+ alphanumeric characters, you would have had to write
+ /[A-Za-z0-9]/. If your character set had other alpha-
+ betic characters in it, this would not match them, and
+ if your character set collated differently from ASCII,
+ this might not even match the ASCII alphanumeric charac-
+ ters. With the POSIX character classes, you can write
+ /[[:alnum:]]/, and this matches the alphabetic and
+ numeric characters in your character set, no matter what
+ it is.
+
+ Two additional special sequences can appear in character
+ lists. These apply to non-ASCII character sets, which
+ can have single symbols (called collating elements) that
+ are represented with more than one character, as well as
+ several characters that are equivalent for collating, or
+ sorting, purposes. (E.g., in French, a plain "e" and a
+ grave-accented "`" are equivalent.)
+
+ Collating Symbols
+ A collating symbol is a multi-character collating
+ element enclosed in [. and .]. For example, if
+ ch is a collating element, then [[.ch.]] is a
+ regular expression that matches this collating
+ element, while [ch] is a regular expression that
+ matches either c or h.
+
+ Equivalence Classes
+ An equivalence class is a locale-specific name
+ for a list of characters that are equivalent.
+ The name is enclosed in [= and =]. For example,
+ the name e might be used to represent all of "e,"
+ "´," and "`." In this case, [[=e=]] is a regular
+ expression that matches any of e, ´, or `.
+
+ These features are very valuable in non-English speaking
+ locales. The library functions that gawk uses for regu-
+ lar expression matching currently only recognize POSIX
+ character classes; they do not recognize collating sym-
+ bols or equivalence classes.
+
+ The \y, \B, \<, \>, \w, \W, \`, and \' operators are
+ specific to gawk; they are extensions based on facili-
+ ties in the GNU regular expression libraries.
+
+ The various command line options control how gawk inter-
+ prets characters in regular expressions.
+
+ No options
+ In the default case, gawk provide all the facili-
+ ties of POSIX regular expressions and the GNU
+ regular expression operators described above.
+ However, interval expressions are not supported.
+
+ --posix
+ Only POSIX regular expressions are supported, the
+ GNU operators are not special. (E.g., \w matches
+ a literal w). Interval expressions are allowed.
+
+ --traditional
+ Traditional Unix awk regular expressions are
+ matched. The GNU operators are not special,
+ interval expressions are not available, and nei-
+ ther are the POSIX character classes ([[:alnum:]]
+ and so on). Characters described by octal and
+ hexadecimal escape sequences are treated liter-
+ ally, even if they represent regular expression
+ metacharacters.
+
+ --re-interval
+ Allow interval expressions in regular expres-
+ sions, even if --traditional has been provided.
+
+ Actions
+ Action statements are enclosed in braces, { and }.
+ Action statements consist of the usual assignment, con-
+ ditional, and looping statements found in most lan-
+ guages. The operators, control statements, and
+ input/output statements available are patterned after
+ those in C.
+
+ Operators
+ The operators in AWK, in order of decreasing precedence,
+ are
+
+
+ (...) Grouping
+
+ $ Field reference.
+
+ ++ -- Increment and decrement, both prefix and
+ postfix.
+
+ ^ Exponentiation (** may also be used, and **=
+ for the assignment operator).
+
+ + - ! Unary plus, unary minus, and logical nega-
+ tion.
+
+ * / % Multiplication, division, and modulus.
+
+ + - Addition and subtraction.
+
+ space String concatenation.
+
+ | |& Piped I/O for getline, print, and printf.
+
+ < >
+ <= >=
+ != == The regular relational operators.
+
+ ~ !~ Regular expression match, negated match.
+ NOTE: Do not use a constant regular expres-
+ sion (/foo/) on the left-hand side of a ~ or
+ !~. Only use one on the right-hand side.
+ The expression /foo/ ~ exp has the same
+ meaning as (($0 ~ /foo/) ~ exp). This is
+ usually not what was intended.
+
+ in Array membership.
+
+ && Logical AND.
+
+ || Logical OR.
+
+ ?: The C conditional expression. This has the
+ form expr1 ? expr2 : expr3. If expr1 is
+ true, the value of the expression is expr2,
+ otherwise it is expr3. Only one of expr2
+ and expr3 is evaluated.
+
+ = += -=
+ *= /= %= ^= Assignment. Both absolute assignment (var =
+ value) and operator-assignment (the other
+ forms) are supported.
+
+ Control Statements
+ The control statements are as follows:
+
+ if (condition) statement [ else statement ]
+ while (condition) statement
+ do statement while (condition)
+ for (expr1; expr2; expr3) statement
+ for (var in array) statement
+ break
+ continue
+ delete array[index]
+ delete array
+ exit [ expression ]
+ { statements }
+
+ I/O Statements
+ The input/output statements are as follows:
+
+
+ close(file [, how]) Close file, pipe or co-process.
+ The optional how should only be
+ used when closing one end of a
+ two-way pipe to a co-process. It
+ must be a string value, either
+ "to" or "from".
+
+ getline Set $0 from next input record; set
+ NF, NR, FNR.
+
+ getline <file Set $0 from next record of file;
+ set NF.
+
+ getline var Set var from next input record;
+ set NR, FNR.
+
+ getline var <file Set var from next record of file.
+
+ command | getline [var]
+ Run command piping the output
+ either into $0 or var, as above.
+
+ command |& getline [var]
+ Run command as a co-process piping
+ the output either into $0 or var,
+ as above. Co-processes are a gawk
+ extension. (command can also be a
+ socket. See the subsection Spe-
+ cial File Names, below.)
+
+ next Stop processing the current input
+ record. The next input record is
+ read and processing starts over
+ with the first pattern in the AWK
+ program. If the end of the input
+ data is reached, the END block(s),
+ if any, are executed.
+
+ nextfile Stop processing the current input
+ file. The next input record read
+ comes from the next input file.
+ FILENAME and ARGIND are updated,
+ FNR is reset to 1, and processing
+ starts over with the first pattern
+ in the AWK program. If the end of
+ the input data is reached, the END
+ block(s), if any, are executed.
+
+ print Prints the current record. The
+ output record is terminated with
+ the value of the ORS variable.
+
+ print expr-list Prints expressions. Each expres-
+ sion is separated by the value of
+ the OFS variable. The output
+ record is terminated with the
+ value of the ORS variable.
+
+ print expr-list >file Prints expressions on file. Each
+ expression is separated by the
+ value of the OFS variable. The
+ output record is terminated with
+ the value of the ORS variable.
+
+ printf fmt, expr-list Format and print.
+
+ printf fmt, expr-list >file
+ Format and print on file.
+
+ system(cmd-line) Execute the command cmd-line, and
+ return the exit status. (This may
+ not be available on non-POSIX sys-
+ tems.)
+
+ fflush([file]) Flush any buffers associated with
+ the open output file or pipe file.
+ If file is missing, then standard
+ output is flushed. If file is the
+ null string, then all open output
+ files and pipes have their buffers
+ flushed.
+
+ Additional output redirections are allowed for print and
+ printf.
+
+ print ... >> file
+ Appends output to the file.
+
+ print ... | command
+ Writes on a pipe.
+
+ print ... |& command
+ Sends data to a co-process or socket. (See also
+ the subsection Special File Names, below.)
+
+ The getline command returns 0 on end of file and -1 on
+ an error. Upon an error, ERRNO contains a string
+ describing the problem.
+
+ NOTE: If using a pipe, co-process, or socket to getline,
+ or from print or printf within a loop, you must use
+ close() to create new instances of the command or
+ socket. AWK does not automatically close pipes, sock-
+ ets, or co-processes when they return EOF.
+
+ The printf Statement
+ The AWK versions of the printf statement and sprintf()
+ function (see below) accept the following conversion
+ specification formats:
+
+ %c An ASCII character. If the argument used for %c
+ is numeric, it is treated as a character and
+ printed. Otherwise, the argument is assumed to
+ be a string, and the only first character of
+ that string is printed.
+
+ %d, %i A decimal number (the integer part).
+
+ %e, %E A floating point number of the form
+ [-]d.dddddde[+-]dd. The %E format uses E
+ instead of e.
+
+ %f, %F A floating point number of the form
+ [-]ddd.dddddd. If the system library supports
+ it, %F is available as well. This is like %f,
+ but uses capital letters for special "not a num-
+ ber" and "infinity" values. If %F is not avail-
+ able, gawk uses %f.
+
+ %g, %G Use %e or %f conversion, whichever is shorter,
+ with nonsignificant zeros suppressed. The %G
+ format uses %E instead of %e.
+
+ %o An unsigned octal number (also an integer).
+
+ %u An unsigned decimal number (again, an integer).
+
+ %s A character string.
+
+ %x, %X An unsigned hexadecimal number (an integer).
+ The %X format uses ABCDEF instead of abcdef.
+
+ %% A single % character; no argument is converted.
+
+ NOTE: When using the integer format-control letters for
+ values that are outside the range of a C long integer,
+ gawk switches to the %0f format specifier. If --lint is
+ provided on the command line gawk warns about this.
+ Other versions of awk may print invalid values or do
+ something else entirely.
+
+ Optional, additional parameters may lie between the %
+ and the control letter:
+
+ count$ Use the count'th argument at this point in the
+ formatting. This is called a positional speci-
+ fier and is intended primarily for use in trans-
+ lated versions of format strings, not in the
+ original text of an AWK program. It is a gawk
+ extension.
+
+ - The expression should be left-justified within
+ its field.
+
+ space For numeric conversions, prefix positive values
+ with a space, and negative values with a minus
+ sign.
+
+ + The plus sign, used before the width modifier
+ (see below), says to always supply a sign for
+ numeric conversions, even if the data to be for-
+ matted is positive. The + overrides the space
+ modifier.
+
+ # Use an "alternate form" for certain control let-
+ ters. For %o, supply a leading zero. For %x,
+ and %X, supply a leading 0x or 0X for a nonzero
+ result. For %e, %E, %f and %F, the result always
+ contains a decimal point. For %g, and %G, trail-
+ ing zeros are not removed from the result.
+
+ 0 A leading 0 (zero) acts as a flag, that indicates
+ output should be padded with zeroes instead of
+ spaces. This applies even to non-numeric output
+ formats. This flag only has an effect when the
+ field width is wider than the value to be
+ printed.
+
+ width The field should be padded to this width. The
+ field is normally padded with spaces. If the 0
+ flag has been used, it is padded with zeroes.
+
+ .prec A number that specifies the precision to use when
+ printing. For the %e, %E, %f and %F, formats,
+ this specifies the number of digits you want
+ printed to the right of the decimal point. For
+ the %g, and %G formats, it specifies the maximum
+ number of significant digits. For the %d, %o,
+ %i, %u, %x, and %X formats, it specifies the min-
+ imum number of digits to print. For %s, it spec-
+ ifies the maximum number of characters from the
+ string that should be printed.
+
+ The dynamic width and prec capabilities of the ANSI C
+ printf() routines are supported. A * in place of either
+ the width or prec specifications causes their values to
+ be taken from the argument list to printf or sprintf().
+ To use a positional specifier with a dynamic width or
+ precision, supply the count$ after the * in the format
+ string. For example, "%3$*2$.*1$s".
+
+ Special File Names
+ When doing I/O redirection from either print or printf
+ into a file, or via getline from a file, gawk recognizes
+ certain special filenames internally. These filenames
+ allow access to open file descriptors inherited from
+ gawk's parent process (usually the shell). These file
+ names may also be used on the command line to name data
+ files. The filenames are:
+
+ /dev/stdin The standard input.
+
+ /dev/stdout The standard output.
+
+ /dev/stderr The standard error output.
+
+ /dev/fd/n The file associated with the open file
+ descriptor n.
+
+ These are particularly useful for error messages. For
+ example:
+
+ print "You blew it!" > "/dev/stderr"
+
+ whereas you would otherwise have to use
+
+ print "You blew it!" | "cat 1>&2"
+
+ The following special filenames may be used with the |&
+ co-process operator for creating TCP/IP network connec-
+ tions.
+
+ /inet/tcp/lport/rhost/rport File for TCP/IP connection
+ on local port lport to
+ remote host rhost on remote
+ port rport. Use a port of
+ 0 to have the system pick a
+ port.
+
+ /inet/udp/lport/rhost/rport Similar, but use UDP/IP
+ instead of TCP/IP.
+
+ /inet/raw/lport/rhost/rport Reserved for future use.
+
+ Other special filenames provide access to information
+ about the running gawk process. These filenames are now
+ obsolete. Use the PROCINFO array to obtain the informa-
+ tion they provide. The filenames are:
+
+ /dev/pid Reading this file returns the process ID of
+ the current process, in decimal, terminated
+ with a newline.
+
+ /dev/ppid Reading this file returns the parent process
+ ID of the current process, in decimal, ter-
+ minated with a newline.
+
+ /dev/pgrpid Reading this file returns the process group
+ ID of the current process, in decimal, ter-
+ minated with a newline.
+
+ /dev/user Reading this file returns a single record
+ terminated with a newline. The fields are
+ separated with spaces. $1 is the value of
+ the getuid(2) system call, $2 is the value
+ of the geteuid(2) system call, $3 is the
+ value of the getgid(2) system call, and $4
+ is the value of the getegid(2) system call.
+ If there are any additional fields, they are
+ the group IDs returned by getgroups(2).
+ Multiple groups may not be supported on all
+ systems.
+
+ Numeric Functions
+ AWK has the following built-in arithmetic functions:
+
+
+ atan2(y, x) Returns the arctangent of y/x in radians.
+
+ cos(expr) Returns the cosine of expr, which is in
+ radians.
+
+ exp(expr) The exponential function.
+
+ int(expr) Truncates to integer.
+
+ log(expr) The natural logarithm function.
+
+ rand() Returns a random number N, between 0 and
+ 1, such that 0 <= N < 1.
+
+ sin(expr) Returns the sine of expr, which is in
+ radians.
+
+ sqrt(expr) The square root function.
+
+ srand([expr]) Uses expr as a new seed for the random
+ number generator. If no expr is provided,
+ the time of day is used. The return value
+ is the previous seed for the random number
+ generator.
+
+ String Functions
+ Gawk has the following built-in string functions:
+
+
+ asort(s [, d]) Returns the number of elements
+ in the source array s. The con-
+ tents of s are sorted using
+ gawk's normal rules for compar-
+ ing values, and the indices of
+ the sorted values of s are
+ replaced with sequential inte-
+ gers starting with 1. If the
+ optional destination array d is
+ specified, then s is first
+ duplicated into d, and then d is
+ sorted, leaving the indices of
+ the source array s unchanged.
+
+ asorti(s [, d]) Returns the number of elements
+ in the source array s. The
+ behavior is the same as that of
+ asort(), except that the array
+ indices are used for sorting,
+ not the array values. When
+ done, the array is indexed
+ numerically, and the values are
+ those of the original indices.
+ The original values are lost;
+ thus provide a second array if
+ you wish to preserve the origi-
+ nal.
+
+ gensub(r, s, h [, t]) Search the target string t for
+ matches of the regular expres-
+ sion r. If h is a string begin-
+ ning with g or G, then replace
+ all matches of r with s. Other-
+ wise, h is a number indicating
+ which match of r to replace. If
+ t is not supplied, $0 is used
+ instead. Within the replacement
+ text s, the sequence \n, where n
+ is a digit from 1 to 9, may be
+ used to indicate just the text
+ that matched the n'th parenthe-
+ sized subexpression. The
+ sequence \0 represents the
+ entire matched text, as does the
+ character &. Unlike sub() and
+ gsub(), the modified string is
+ returned as the result of the
+ function, and the original tar-
+ get string is not changed.
+
+ gsub(r, s [, t]) For each substring matching the
+ regular expression r in the
+ string t, substitute the string
+ s, and return the number of sub-
+ stitutions. If t is not sup-
+ plied, use $0. An & in the
+ replacement text is replaced
+ with the text that was actually
+ matched. Use \& to get a lit-
+ eral &. (This must be typed as
+ "\\&"; see GAWK: Effective AWK
+ Programming for a fuller discus-
+ sion of the rules for &'s and
+ backslashes in the replacement
+ text of sub(), gsub(), and gen-
+ sub().)
+
+ index(s, t) Returns the index of the string
+ t in the string s, or 0 if t is
+ not present. (This implies that
+ character indices start at one.)
+
+ length([s]) Returns the length of the string
+ s, or the length of $0 if s is
+ not supplied. Starting with
+ version 3.1.5, as a non-standard
+ extension, with an array argu-
+ ment, length() returns the num-
+ ber of elements in the array.
+
+ match(s, r [, a]) Returns the position in s where
+ the regular expression r occurs,
+ or 0 if r is not present, and
+ sets the values of RSTART and
+ RLENGTH. Note that the argument
+ order is the same as for the ~
+ operator: str ~ re. If array a
+ is provided, a is cleared and
+ then elements 1 through n are
+ filled with the portions of s
+ that match the corresponding
+ parenthesized subexpression in
+ r. The 0'th element of a con-
+ tains the portion of s matched
+ by the entire regular expression
+ r. Subscripts a[n, "start"],
+ and a[n, "length"] provide the
+ starting index in the string and
+ length respectively, of each
+ matching substring.
+
+ split(s, a [, r]) Splits the string s into the
+ array a on the regular expres-
+ sion r, and returns the number
+ of fields. If r is omitted, FS
+ is used instead. The array a is
+ cleared first. Splitting
+ behaves identically to field
+ splitting, described above.
+
+ sprintf(fmt, expr-list) Prints expr-list according to
+ fmt, and returns the resulting
+ string.
+
+ strtonum(str) Examines str, and returns its
+ numeric value. If str begins
+ with a leading 0, strtonum()
+ assumes that str is an octal
+ number. If str begins with a
+ leading 0x or 0X, strtonum()
+ assumes that str is a hexadeci-
+ mal number.
+
+ sub(r, s [, t]) Just like gsub(), but only the
+ first matching substring is
+ replaced.
+
+ substr(s, i [, n]) Returns the at most n-character
+ substring of s starting at i.
+ If n is omitted, the rest of s
+ is used.
+
+ tolower(str) Returns a copy of the string
+ str, with all the upper-case
+ characters in str translated to
+ their corresponding lower-case
+ counterparts. Non-alphabetic
+ characters are left unchanged.
+
+ toupper(str) Returns a copy of the string
+ str, with all the lower-case
+ characters in str translated to
+ their corresponding upper-case
+ counterparts. Non-alphabetic
+ characters are left unchanged.
+
+ As of version 3.1.5, gawk is multibyte aware. This
+ means that index(), length(), substr() and match() all
+ work in terms of characters, not bytes.
+
+ Time Functions
+ Since one of the primary uses of AWK programs is pro-
+ cessing log files that contain time stamp information,
+ gawk provides the following functions for obtaining time
+ stamps and formatting them.
+
+
+ mktime(datespec)
+ Turns datespec into a time stamp of the same
+ form as returned by systime(). The datespec
+ is a string of the form YYYY MM DD HH MM SS[
+ DST]. The contents of the string are six or
+ seven numbers representing respectively the
+ full year including century, the month from 1
+ to 12, the day of the month from 1 to 31, the
+ hour of the day from 0 to 23, the minute from
+ 0 to 59, and the second from 0 to 60, and an
+ optional daylight saving flag. The values of
+ these numbers need not be within the ranges
+ specified; for example, an hour of -1 means 1
+ hour before midnight. The origin-zero Grego-
+ rian calendar is assumed, with year 0 preced-
+ ing year 1 and year -1 preceding year 0. The
+ time is assumed to be in the local timezone.
+ If the daylight saving flag is positive, the
+ time is assumed to be daylight saving time; if
+ zero, the time is assumed to be standard time;
+ and if negative (the default), mktime()
+ attempts to determine whether daylight saving
+ time is in effect for the specified time. If
+ datespec does not contain enough elements or
+ if the resulting time is out of range,
+ mktime() returns -1.
+
+ strftime([format [, timestamp[, utc-flag]]])
+ Formats timestamp according to the specifica-
+ tion in format. If utc-flag is present and is
+ non-zero or non-null, the result is in UTC,
+ otherwise the result is in local time. The
+ timestamp should be of the same form as
+ returned by systime(). If timestamp is miss-
+ ing, the current time of day is used. If for-
+ mat is missing, a default format equivalent to
+ the output of date(1) is used. See the speci-
+ fication for the strftime() function in ANSI C
+ for the format conversions that are guaranteed
+ to be available.
+
+ systime() Returns the current time of day as the number
+ of seconds since the Epoch (1970-01-01
+ 00:00:00 UTC on POSIX systems).
+
+ Bit Manipulations Functions
+ Starting with version 3.1 of gawk, the following bit
+ manipulation functions are available. They work by con-
+ verting double-precision floating point values to
+ uintmax_t integers, doing the operation, and then con-
+ verting the result back to floating point. The func-
+ tions are:
+
+ and(v1, v2) Return the bitwise AND of the values
+ provided by v1 and v2.
+
+ compl(val) Return the bitwise complement of
+ val.
+
+ lshift(val, count) Return the value of val, shifted
+ left by count bits.
+
+ or(v1, v2) Return the bitwise OR of the values
+ provided by v1 and v2.
+
+ rshift(val, count) Return the value of val, shifted
+ right by count bits.
+
+ xor(v1, v2) Return the bitwise XOR of the values
+ provided by v1 and v2.
+
+
+ Internationalization Functions
+ Starting with version 3.1 of gawk, the following func-
+ tions may be used from within your AWK program for
+ translating strings at run-time. For full details, see
+ GAWK: Effective AWK Programming.
+
+ bindtextdomain(directory [, domain])
+ Specifies the directory where gawk looks for the
+ .mo files, in case they will not or cannot be
+ placed in the ``standard'' locations (e.g., dur-
+ ing testing). It returns the directory where
+ domain is ``bound.''
+ The default domain is the value of TEXTDOMAIN.
+ If directory is the null string (""), then bind-
+ textdomain() returns the current binding for the
+ given domain.
+
+ dcgettext(string [, domain [, category]])
+ Returns the translation of string in text domain
+ domain for locale category category. The default
+ value for domain is the current value of TEXTDO-
+ MAIN. The default value for category is "LC_MES-
+ SAGES".
+ If you supply a value for category, it must be a
+ string equal to one of the known locale cate-
+ gories described in GAWK: Effective AWK Program-
+ ming. You must also supply a text domain. Use
+ TEXTDOMAIN if you want to use the current domain.
+
+ dcngettext(string1 , string2 , number [, domain [, cate-
+ gory]])
+ Returns the plural form used for number of the
+ translation of string1 and string2 in text domain
+ domain for locale category category. The default
+ value for domain is the current value of TEXTDO-
+ MAIN. The default value for category is "LC_MES-
+ SAGES".
+ If you supply a value for category, it must be a
+ string equal to one of the known locale cate-
+ gories described in GAWK: Effective AWK Program-
+ ming. You must also supply a text domain. Use
+ TEXTDOMAIN if you want to use the current domain.
+
+USER-DEFINED FUNCTIONS
+ Functions in AWK are defined as follows:
+
+ function name(parameter list) { statements }
+
+ Functions are executed when they are called from within
+ expressions in either patterns or actions. Actual
+ parameters supplied in the function call are used to
+ instantiate the formal parameters declared in the func-
+ tion. Arrays are passed by reference, other variables
+ are passed by value.
+
+ Since functions were not originally part of the AWK lan-
+ guage, the provision for local variables is rather
+ clumsy: They are declared as extra parameters in the
+ parameter list. The convention is to separate local
+ variables from real parameters by extra spaces in the
+ parameter list. For example:
+
+ function f(p, q, a, b) # a and b are local
+ {
+ ...
+ }
+
+ /abc/ { ... ; f(1, 2) ; ... }
+
+ The left parenthesis in a function call is required to
+ immediately follow the function name, without any inter-
+ vening white space. This avoids a syntactic ambiguity
+ with the concatenation operator. This restriction does
+ not apply to the built-in functions listed above.
+
+ Functions may call each other and may be recursive.
+ Function parameters used as local variables are initial-
+ ized to the null string and the number zero upon func-
+ tion invocation.
+
+ Use return expr to return a value from a function. The
+ return value is undefined if no value is provided, or if
+ the function returns by "falling off" the end.
+
+ If --lint has been provided, gawk warns about calls to
+ undefined functions at parse time, instead of at run
+ time. Calling an undefined function at run time is a
+ fatal error.
+
+ The word func may be used in place of function.
+
+DYNAMICALLY LOADING NEW FUNCTIONS
+ Beginning with version 3.1 of gawk, you can dynamically
+ add new built-in functions to the running gawk inter-
+ preter. The full details are beyond the scope of this
+ manual page; see GAWK: Effective AWK Programming for the
+ details.
+
+
+ extension(object, function)
+ Dynamically link the shared object file named by
+ object, and invoke function in that object, to
+ perform initialization. These should both be
+ provided as strings. Returns the value returned
+ by function.
+
+ This function is provided and documented in GAWK: Effec-
+ tive AWK Programming, but everything about this feature
+ is likely to change eventually. We STRONGLY recommend
+ that you do not use this feature for anything that you
+ aren't willing to redo.
+
+SIGNALS
+ pgawk accepts two signals. SIGUSR1 causes it to dump a
+ profile and function call stack to the profile file,
+ which is either awkprof.out, or whatever file was named
+ with the --profile option. It then continues to run.
+ SIGHUP causes pgawk to dump the profile and function
+ call stack and then exit.
+
+EXAMPLES
+ Print and sort the login names of all users:
+
+ BEGIN { FS = ":" }
+ { print $1 | "sort" }
+
+ Count lines in a file:
+
+ { nlines++ }
+ END { print nlines }
+
+ Precede each line by its number in the file:
+
+ { print FNR, $0 }
+
+ Concatenate and line number (a variation on a theme):
+
+ { print NR, $0 }
+ Run an external command for particular lines of data:
+
+ tail -f access_log |
+ awk '/myhome.html/ { system("nmap " $1 ">> logdir/myhome.html") }'
+
+INTERNATIONALIZATION
+ String constants are sequences of characters enclosed in
+ double quotes. In non-English speaking environments, it
+ is possible to mark strings in the AWK program as
+ requiring translation to the native natural language.
+ Such strings are marked in the AWK program with a lead-
+ ing underscore ("_"). For example,
+
+ gawk 'BEGIN { print "hello, world" }'
+
+ always prints hello, world. But,
+
+ gawk 'BEGIN { print _"hello, world" }'
+
+ might print bonjour, monde in France.
+
+ There are several steps involved in producing and run-
+ ning a localizable AWK program.
+
+ 1. Add a BEGIN action to assign a value to the TEXTDO-
+ MAIN variable to set the text domain to a name asso-
+ ciated with your program.
+
+ BEGIN { TEXTDOMAIN = "myprog" }
+
+ This allows gawk to find the .mo file associated with
+ your program. Without this step, gawk uses the messages
+ text domain, which likely does not contain translations
+ for your program.
+
+ 2. Mark all strings that should be translated with
+ leading underscores.
+
+ 3. If necessary, use the dcgettext() and/or bindtextdo-
+ main() functions in your program, as appropriate.
+
+ 4. Run gawk --gen-po -f myprog.awk > myprog.po to gen-
+ erate a .po file for your program.
+
+ 5. Provide appropriate translations, and build and
+ install the corresponding .mo files.
+
+ The internationalization features are described in full
+ detail in GAWK: Effective AWK Programming.
+
+POSIX COMPATIBILITY
+ A primary goal for gawk is compatibility with the POSIX
+ standard, as well as with the latest version of UNIX
+ awk. To this end, gawk incorporates the following user
+ visible features which are not described in the AWK
+ book, but are part of the Bell Laboratories version of
+ awk, and are in the POSIX standard.
+
+ The book indicates that command line variable assignment
+ happens when awk would otherwise open the argument as a
+ file, which is after the BEGIN block is executed. How-
+ ever, in earlier implementations, when such an assign-
+ ment appeared before any file names, the assignment
+ would happen before the BEGIN block was run. Applica-
+ tions came to depend on this "feature." When awk was
+ changed to match its documentation, the -v option for
+ assigning variables before program execution was added
+ to accommodate applications that depended upon the old
+ behavior. (This feature was agreed upon by both the
+ Bell Laboratories and the GNU developers.)
+
+ The -W option for implementation specific features is
+ from the POSIX standard.
+
+ When processing arguments, gawk uses the special option
+ "--" to signal the end of arguments. In compatibility
+ mode, it warns about but otherwise ignores undefined
+ options. In normal operation, such arguments are passed
+ on to the AWK program for it to process.
+
+ The AWK book does not define the return value of
+ srand(). The POSIX standard has it return the seed it
+ was using, to allow keeping track of random number
+ sequences. Therefore srand() in gawk also returns its
+ current seed.
+
+ Other new features are: The use of multiple -f options
+ (from MKS awk); the ENVIRON array; the \a, and \v escape
+ sequences (done originally in gawk and fed back into the
+ Bell Laboratories version); the tolower() and toupper()
+ built-in functions (from the Bell Laboratories version);
+ and the ANSI C conversion specifications in printf (done
+ first in the Bell Laboratories version).
+
+HISTORICAL FEATURES
+ There are two features of historical AWK implementations
+ that gawk supports. First, it is possible to call the
+ length() built-in function not only with no argument,
+ but even without parentheses! Thus,
+
+ a = length # Holy Algol 60, Batman!
+
+ is the same as either of
+
+ a = length()
+ a = length($0)
+
+ This feature is marked as "deprecated" in the POSIX
+ standard, and gawk issues a warning about its use if
+ --lint is specified on the command line.
+
+ The other feature is the use of either the continue or
+ the break statements outside the body of a while, for,
+ or do loop. Traditional AWK implementations have
+ treated such usage as equivalent to the next statement.
+ Gawk supports this usage if --traditional has been spec-
+ ified.
+
+GNU EXTENSIONS
+ Gawk has a number of extensions to POSIX awk. They are
+ described in this section. All the extensions described
+ here can be disabled by invoking gawk with the --tradi-
+ tional or --posix options.
+
+ The following features of gawk are not available in
+ POSIX awk.
+
+ · No path search is performed for files named via the -f
+ option. Therefore the AWKPATH environment variable is
+ not special.
+
+ · The \x escape sequence. (Disabled with --posix.)
+
+ · The fflush() function. (Disabled with --posix.)
+
+ · The ability to continue lines after ? and :. (Dis-
+ abled with --posix.)
+
+ · Octal and hexadecimal constants in AWK programs.
+
+ · The ARGIND, BINMODE, ERRNO, LINT, RT and TEXTDOMAIN
+ variables are not special.
+
+ · The IGNORECASE variable and its side-effects are not
+ available.
+
+ · The FIELDWIDTHS variable and fixed-width field split-
+ ting.
+
+ · The PROCINFO array is not available.
+
+ · The use of RS as a regular expression.
+
+ · The special file names available for I/O redirection
+ are not recognized.
+
+ · The |& operator for creating co-processes.
+
+ · The ability to split out individual characters using
+ the null string as the value of FS, and as the third
+ argument to split().
+
+ · The optional second argument to the close() function.
+
+ · The optional third argument to the match() function.
+
+ · The ability to use positional specifiers with printf
+ and sprintf().
+
+ · The ability to pass an array to length().
+
+ · The use of delete array to delete the entire contents
+ of an array.
+
+ · The use of nextfile to abandon processing of the cur-
+ rent input file.
+
+ · The and(), asort(), asorti(), bindtextdomain(),
+ compl(), dcgettext(), dcngettext(), gensub(),
+ lshift(), mktime(), or(), rshift(), strftime(), str-
+ tonum(), systime() and xor() functions.
+
+ · Localizable strings.
+
+ · Adding new built-in functions dynamically with the
+ extension() function.
+
+ The AWK book does not define the return value of the
+ close() function. Gawk's close() returns the value from
+ fclose(3), or pclose(3), when closing an output file or
+ pipe, respectively. It returns the process's exit sta-
+ tus when closing an input pipe. The return value is -1
+ if the named file, pipe or co-process was not opened
+ with a redirection.
+
+ When gawk is invoked with the --traditional option, if
+ the fs argument to the -F option is "t", then FS is set
+ to the tab character. Note that typing gawk -F\t ...
+ simply causes the shell to quote the "t," and does not
+ pass "\t" to the -F option. Since this is a rather ugly
+ special case, it is not the default behavior. This
+ behavior also does not occur if --posix has been speci-
+ fied. To really get a tab character as the field sepa-
+ rator, it is best to use single quotes: gawk -F'\t' ....
+
+ If gawk is configured with the --enable-switch option to
+ the configure command, then it accepts an additional
+ control-flow statement:
+ switch (expression) {
+ case value|regex : statement
+ ...
+ [ default: statement ]
+ }
+
+ If gawk is configured with the --disable-directories-
+ fatal option, then it will silently skip directories
+ named on the command line. Otherwise, it will do so
+ only if invoked with the --traditional option.
+
+ENVIRONMENT VARIABLES
+ The AWKPATH environment variable can be used to provide
+ a list of directories that gawk searches when looking
+ for files named via the -f and --file options.
+
+ If POSIXLY_CORRECT exists in the environment, then gawk
+ behaves exactly as if --posix had been specified on the
+ command line. If --lint has been specified, gawk issues
+ a warning message to this effect.
+
+SEE ALSO
+ egrep(1), getpid(2), getppid(2), getpgrp(2), getuid(2),
+ geteuid(2), getgid(2), getegid(2), getgroups(2)
+
+ The AWK Programming Language, Alfred V. Aho, Brian W.
+ Kernighan, Peter J. Weinberger, Addison-Wesley, 1988.
+ ISBN 0-201-07981-X.
+
+ GAWK: Effective AWK Programming, Edition 3.0, published
+ by the Free Software Foundation, 2001. The current ver-
+ sion of this document is available online at
+ http://www.gnu.org/software/gawk/manual.
+
+BUGS
+ The -F option is not necessary given the command line
+ variable assignment feature; it remains only for back-
+ wards compatibility.
+
+ Syntactically invalid single character programs tend to
+ overflow the parse stack, generating a rather unhelpful
+ message. Such programs are surprisingly difficult to
+ diagnose in the completely general case, and the effort
+ to do so really is not worth it.
+
+AUTHORS
+ The original version of UNIX awk was designed and imple-
+ mented by Alfred Aho, Peter Weinberger, and Brian
+ Kernighan of Bell Laboratories. Brian Kernighan contin-
+ ues to maintain and enhance it.
+
+ Paul Rubin and Jay Fenlason, of the Free Software Foun-
+ dation, wrote gawk, to be compatible with the original
+ version of awk distributed in Seventh Edition UNIX.
+ John Woods contributed a number of bug fixes. David
+ Trueman, with contributions from Arnold Robbins, made
+ gawk compatible with the new version of UNIX awk.
+ Arnold Robbins is the current maintainer.
+
+ The initial DOS port was done by Conrad Kwok and Scott
+ Garfinkle. Scott Deifik is the current DOS maintainer.
+ Pat Rankin did the port to VMS, and Michal Jaegermann
+ did the port to the Atari ST. The port to OS/2 was done
+ by Kai Uwe Rommel, with contributions and help from Dar-
+ rel Hankerson. Juan M. Guerrero now maintains the OS/2
+ port. Fred Fish supplied support for the Amiga, and
+ Martin Brown provided the BeOS port. Stephen Davies
+ provided the original Tandem port, and Matthew Woehlke
+ provided changes for Tandem's POSIX-compliant systems.
+
+VERSION INFORMATION
+ This man page documents gawk, version 3.1.6.
+
+BUG REPORTS
+ If you find a bug in gawk, please send electronic mail
+ to bug-gawk@gnu.org. Please include your operating sys-
+ tem and its revision, the version of gawk (from gawk
+ --version), what C compiler you used to compile it, and
+ a test program and data that are as small as possible
+ for reproducing the problem.
+
+ Before sending a bug report, please do the following
+ things. First, verify that you have the latest version
+ of gawk. Many bugs (usually subtle ones) are fixed at
+ each release, and if yours is out of date, the problem
+ may already have been solved. Second, please see if
+ setting the environment variable LC_ALL to LC_ALL=C
+ causes things to behave as you expect. If so, it's a
+ locale issue, and may or may not really be a bug.
+ Finally, please read this man page and the reference
+ manual carefully to be sure that what you think is a bug
+ really is, instead of just a quirk in the language.
+
+ Whatever you do, do NOT post a bug report in
+ comp.lang.awk. While the gawk developers occasionally
+ read this newsgroup, posting bug reports there is an
+ unreliable way to report bugs. Instead, please use the
+ electronic mail addresses given above.
+
+ If you're using a GNU/Linux system or BSD-based system,
+ you may wish to submit a bug report to the vendor of
+ your distribution. That's fine, but please send a copy
+ to the official email address as well, since there's no
+ guarantee that the bug will be forwarded to the gawk
+ maintainer.
+
+ACKNOWLEDGEMENTS
+ Brian Kernighan of Bell Laboratories provided valuable
+ assistance during testing and debugging. We thank him.
+
+COPYING PERMISSIONS
+ Copyright © 1989, 1991, 1992, 1993, 1994, 1995, 1996,
+ 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005, 2007
+ Free Software Foundation, Inc.
+
+ Permission is granted to make and distribute verbatim
+ copies of this manual page provided the copyright notice
+ and this permission notice are preserved on all copies.
+
+ Permission is granted to copy and distribute modified
+ versions of this manual page under the conditions for
+ verbatim copying, provided that the entire resulting
+ derived work is distributed under the terms of a permis-
+ sion notice identical to this one.
+
+ Permission is granted to copy and distribute transla-
+ tions of this manual page into another language, under
+ the above conditions for modified versions, except that
+ this permission notice may be stated in a translation
+ approved by the Foundation.
+
+
+
+Free Software Foundation Oct 19 2007 GAWK(1)