From f5c4671bfbad96bf346bd7e9a21fc4317b4959df Mon Sep 17 00:00:00 2001
From: Indrajith K L
Date: Sat, 3 Dec 2022 17:00:20 +0530
Subject: Adds most of the tools
---
.../gawk/3.1.6/gawk-3.1.6-src/doc/README.card | 19 +
.../gawk/3.1.6/gawk-3.1.6-src/doc/gawk.info | 24684 +++++++++++++++++++
.../gawk/3.1.6/gawk-3.1.6-src/doc/gawkinet.info | 4404 ++++
3 files changed, 29107 insertions(+)
create mode 100644 coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/README.card
create mode 100644 coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/gawk.info
create mode 100644 coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/gawkinet.info
(limited to 'coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc')
diff --git a/coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/README.card b/coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/README.card
new file mode 100644
index 0000000..ef77cda
--- /dev/null
+++ b/coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/README.card
@@ -0,0 +1,19 @@
+Mon Dec 9 12:45:48 EST 1996
+
+The AWK reference card included here requires a modern version of troff
+(ditroff). GNU Troff (groff) is known to work.
+
+If your troff is able to produce Postscript but does not know how to
+properly use the macros from `colors' file then try to uncomment in
+Makefile the defintion which sets AWKCARD to awkcard.nc (no colors).
+This will definitely require changes to the TROFF macro and you have to
+ensure that the tbl preprocessor is called. For example, the following
+modifications on NeXT:
+
+TROFF = tbl
+SEDME = ptroff -t | sed -e \
+ "s/^level0 restore/level0 restore flashme 100 72 moveto\
+ (Copyright `date`, FSF, Inc. (all)) show/" \
+ -e "s/^\/level0 save def/\/level0 save def 30 -48 translate/"
+
+will produce a correctly formatted, albeit monochromatic, reference card.
diff --git a/coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/gawk.info b/coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/gawk.info
new file mode 100644
index 0000000..98e33fb
--- /dev/null
+++ b/coreutils-5.3.0-bin/contrib/gawk/3.1.6/gawk-3.1.6-src/doc/gawk.info
@@ -0,0 +1,24684 @@
+INFO-DIR-SECTION Text creation and manipulation
+START-INFO-DIR-ENTRY
+This is gawk.info, produced by makeinfo version 4.11 from gawk.texi.
+
+* Gawk: (gawk). A text scanning and processing language.
+END-INFO-DIR-ENTRY
+INFO-DIR-SECTION Individual utilities
+START-INFO-DIR-ENTRY
+* awk: (gawk)Invoking gawk. Text scanning and processing.
+END-INFO-DIR-ENTRY
+
+ Copyright (C) 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000,
+2001, 2002, 2003, 2004, 2005, 2007 Free Software Foundation, Inc.
+
+
+ This is Edition 3 of `GAWK: Effective AWK Programming: A User's
+Guide for GNU Awk', for the 3.1.6 (or later) version of the GNU
+implementation of AWK.
+
+ Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with the
+Invariant Sections being "GNU General Public License", the Front-Cover
+texts being (a) (see below), and with the Back-Cover Texts being (b)
+(see below). A copy of the license is included in the section entitled
+"GNU Free Documentation License".
+
+ a. "A GNU Manual"
+
+ b. "You have freedom to copy and modify this GNU Manual, like GNU
+ software. Copies published by the Free Software Foundation raise
+ funds for GNU development."
+
+
+File: gawk.info, Node: Top, Next: Foreword, Up: (dir)
+
+General Introduction
+********************
+
+This file documents `awk', a program that you can use to select
+particular records in a file and perform operations upon them.
+
+ Copyright (C) 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000,
+2001, 2002, 2003, 2004, 2005, 2007 Free Software Foundation, Inc.
+
+
+ This is Edition 3 of `GAWK: Effective AWK Programming: A User's
+Guide for GNU Awk', for the 3.1.6 (or later) version of the GNU
+implementation of AWK.
+
+ Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.2 or
+any later version published by the Free Software Foundation; with the
+Invariant Sections being "GNU General Public License", the Front-Cover
+texts being (a) (see below), and with the Back-Cover Texts being (b)
+(see below). A copy of the license is included in the section entitled
+"GNU Free Documentation License".
+
+ a. "A GNU Manual"
+
+ b. "You have freedom to copy and modify this GNU Manual, like GNU
+ software. Copies published by the Free Software Foundation raise
+ funds for GNU development."
+
+* Menu:
+
+* Foreword:: Some nice words about this
+ Info file.
+* Preface:: What this Info file is about; brief
+ history and acknowledgments.
+* Getting Started:: A basic introduction to using
+ `awk'. How to run an `awk'
+ program. Command-line syntax.
+* Regexp:: All about matching things using regular
+ expressions.
+* Reading Files:: How to read files and manipulate fields.
+* Printing:: How to print using `awk'. Describes
+ the `print' and `printf'
+ statements. Also describes redirection of
+ output.
+* Expressions:: Expressions are the basic building blocks
+ of statements.
+* Patterns and Actions:: Overviews of patterns and actions.
+* Arrays:: The description and use of arrays. Also
+ includes array-oriented control statements.
+* Functions:: Built-in and user-defined functions.
+* Internationalization:: Getting `gawk' to speak your
+ language.
+* Advanced Features:: Stuff for advanced users, specific to
+ `gawk'.
+* Invoking Gawk:: How to run `gawk'.
+* Library Functions:: A Library of `awk' Functions.
+* Sample Programs:: Many `awk' programs with complete
+ explanations.
+* Language History:: The evolution of the `awk'
+ language.
+* Installation:: Installing `gawk' under various
+ operating systems.
+* Notes:: Notes about `gawk' extensions and
+ possible future work.
+* Basic Concepts:: A very quick introduction to programming
+ concepts.
+* Glossary:: An explanation of some unfamiliar terms.
+* Copying:: Your right to copy and distribute
+ `gawk'.
+* GNU Free Documentation License:: The license for this Info file.
+* Index:: Concept and Variable Index.
+
+* History:: The history of `gawk' and
+ `awk'.
+* Names:: What name to use to find `awk'.
+* This Manual:: Using this Info file. Includes
+ sample input files that you can use.
+* Conventions:: Typographical Conventions.
+* Manual History:: Brief history of the GNU project and this
+ Info file.
+* How To Contribute:: Helping to save the world.
+* Acknowledgments:: Acknowledgments.
+* Running gawk:: How to run `gawk' programs;
+ includes command-line syntax.
+* One-shot:: Running a short throwaway `awk'
+ program.
+* Read Terminal:: Using no input files (input from terminal
+ instead).
+* Long:: Putting permanent `awk' programs in
+ files.
+* Executable Scripts:: Making self-contained `awk'
+ programs.
+* Comments:: Adding documentation to `gawk'
+ programs.
+* Quoting:: More discussion of shell quoting issues.
+* Sample Data Files:: Sample data files for use in the
+ `awk' programs illustrated in this
+ Info file.
+* Very Simple:: A very simple example.
+* Two Rules:: A less simple one-line example using two
+ rules.
+* More Complex:: A more complex example.
+* Statements/Lines:: Subdividing or combining statements into
+ lines.
+* Other Features:: Other Features of `awk'.
+* When:: When to use `gawk' and when to use
+ other things.
+* Regexp Usage:: How to Use Regular Expressions.
+* Escape Sequences:: How to write nonprinting characters.
+* Regexp Operators:: Regular Expression Operators.
+* Character Lists:: What can go between `[...]'.
+* GNU Regexp Operators:: Operators specific to GNU software.
+* Case-sensitivity:: How to do case-insensitive matching.
+* Leftmost Longest:: How much text matches.
+* Computed Regexps:: Using Dynamic Regexps.
+* Locales:: How the locale affects things.
+* Records:: Controlling how data is split into records.
+* Fields:: An introduction to fields.
+* Nonconstant Fields:: Nonconstant Field Numbers.
+* Changing Fields:: Changing the Contents of a Field.
+* Field Separators:: The field separator and how to change it.
+* Regexp Field Splitting:: Using regexps as the field separator.
+* Single Character Fields:: Making each character a separate field.
+* Command Line Field Separator:: Setting `FS' from the command-line.
+* Field Splitting Summary:: Some final points and a summary table.
+* Constant Size:: Reading constant width data.
+* Multiple Line:: Reading multi-line records.
+* Getline:: Reading files under explicit program
+ control using the `getline' function.
+* Plain Getline:: Using `getline' with no arguments.
+* Getline/Variable:: Using `getline' into a variable.
+* Getline/File:: Using `getline' from a file.
+* Getline/Variable/File:: Using `getline' into a variable from a
+ file.
+* Getline/Pipe:: Using `getline' from a pipe.
+* Getline/Variable/Pipe:: Using `getline' into a variable from a
+ pipe.
+* Getline/Coprocess:: Using `getline' from a coprocess.
+* Getline/Variable/Coprocess:: Using `getline' into a variable from a
+ coprocess.
+* Getline Notes:: Important things to know about
+ `getline'.
+* Getline Summary:: Summary of `getline' Variants.
+* Print:: The `print' statement.
+* Print Examples:: Simple examples of `print' statements.
+* Output Separators:: The output separators and how to change
+ them.
+* OFMT:: Controlling Numeric Output With
+ `print'.
+* Printf:: The `printf' statement.
+* Basic Printf:: Syntax of the `printf' statement.
+* Control Letters:: Format-control letters.
+* Format Modifiers:: Format-specification modifiers.
+* Printf Examples:: Several examples.
+* Redirection:: How to redirect output to multiple files
+ and pipes.
+* Special Files:: File name interpretation in `gawk'.
+ `gawk' allows access to inherited
+ file descriptors.
+* Special FD:: Special files for I/O.
+* Special Process:: Special files for process information.
+* Special Network:: Special files for network communications.
+* Special Caveats:: Things to watch out for.
+* Close Files And Pipes:: Closing Input and Output Files and Pipes.
+* Constants:: String, numeric and regexp constants.
+* Scalar Constants:: Numeric and string constants.
+* Nondecimal-numbers:: What are octal and hex numbers.
+* Regexp Constants:: Regular Expression constants.
+* Using Constant Regexps:: When and how to use a regexp constant.
+* Variables:: Variables give names to values for later
+ use.
+* Using Variables:: Using variables in your programs.
+* Assignment Options:: Setting variables on the command-line and a
+ summary of command-line syntax. This is an
+ advanced method of input.
+* Conversion:: The conversion of strings to numbers and
+ vice versa.
+* Arithmetic Ops:: Arithmetic operations (`+', `-',
+ etc.)
+* Concatenation:: Concatenating strings.
+* Assignment Ops:: Changing the value of a variable or a
+ field.
+* Increment Ops:: Incrementing the numeric value of a
+ variable.
+* Truth Values:: What is ``true'' and what is ``false''.
+* Typing and Comparison:: How variables acquire types and how this
+ affects comparison of numbers and strings
+ with `<', etc.
+* Variable Typing:: String type versus numeric type.
+* Comparison Operators:: The comparison operators.
+* Boolean Ops:: Combining comparison expressions using
+ boolean operators `||' (``or''),
+ `&&' (``and'') and `!' (``not'').
+* Conditional Exp:: Conditional expressions select between two
+ subexpressions under control of a third
+ subexpression.
+* Function Calls:: A function call is an expression.
+* Precedence:: How various operators nest.
+* Pattern Overview:: What goes into a pattern.
+* Regexp Patterns:: Using regexps as patterns.
+* Expression Patterns:: Any expression can be used as a pattern.
+* Ranges:: Pairs of patterns specify record ranges.
+* BEGIN/END:: Specifying initialization and cleanup
+ rules.
+* Using BEGIN/END:: How and why to use BEGIN/END rules.
+* I/O And BEGIN/END:: I/O issues in BEGIN/END rules.
+* Empty:: The empty pattern, which matches every
+ record.
+* Using Shell Variables:: How to use shell variables with
+ `awk'.
+* Action Overview:: What goes into an action.
+* Statements:: Describes the various control statements in
+ detail.
+* If Statement:: Conditionally execute some `awk'
+ statements.
+* While Statement:: Loop until some condition is satisfied.
+* Do Statement:: Do specified action while looping until
+ some condition is satisfied.
+* For Statement:: Another looping statement, that provides
+ initialization and increment clauses.
+* Switch Statement:: Switch/case evaluation for conditional
+ execution of statements based on a value.
+* Break Statement:: Immediately exit the innermost enclosing
+ loop.
+* Continue Statement:: Skip to the end of the innermost enclosing
+ loop.
+* Next Statement:: Stop processing the current input record.
+* Nextfile Statement:: Stop processing the current file.
+* Exit Statement:: Stop execution of `awk'.
+* Built-in Variables:: Summarizes the built-in variables.
+* User-modified:: Built-in variables that you change to
+ control `awk'.
+* Auto-set:: Built-in variables where `awk'
+ gives you information.
+* ARGC and ARGV:: Ways to use `ARGC' and `ARGV'.
+* Array Intro:: Introduction to Arrays
+* Reference to Elements:: How to examine one element of an array.
+* Assigning Elements:: How to change an element of an array.
+* Array Example:: Basic Example of an Array
+* Scanning an Array:: A variation of the `for' statement. It
+ loops through the indices of an array's
+ existing elements.
+* Delete:: The `delete' statement removes an
+ element from an array.
+* Numeric Array Subscripts:: How to use numbers as subscripts in
+ `awk'.
+* Uninitialized Subscripts:: Using Uninitialized variables as
+ subscripts.
+* Multi-dimensional:: Emulating multidimensional arrays in
+ `awk'.
+* Multi-scanning:: Scanning multidimensional arrays.
+* Array Sorting:: Sorting array values and indices.
+* Built-in:: Summarizes the built-in functions.
+* Calling Built-in:: How to call built-in functions.
+* Numeric Functions:: Functions that work with numbers, including
+ `int', `sin' and `rand'.
+* String Functions:: Functions for string manipulation, such as
+ `split', `match' and
+ `sprintf'.
+* Gory Details:: More than you want to know about `\'
+ and `&' with `sub', `gsub',
+ and `gensub'.
+* I/O Functions:: Functions for files and shell commands.
+* Time Functions:: Functions for dealing with timestamps.
+* Bitwise Functions:: Functions for bitwise operations.
+* I18N Functions:: Functions for string translation.
+* User-defined:: Describes User-defined functions in detail.
+* Definition Syntax:: How to write definitions and what they
+ mean.
+* Function Example:: An example function definition and what it
+ does.
+* Function Caveats:: Things to watch out for.
+* Return Statement:: Specifying the value a function returns.
+* Dynamic Typing:: How variable types can change at runtime.
+* I18N and L10N:: Internationalization and Localization.
+* Explaining gettext:: How GNU `gettext' works.
+* Programmer i18n:: Features for the programmer.
+* Translator i18n:: Features for the translator.
+* String Extraction:: Extracting marked strings.
+* Printf Ordering:: Rearranging `printf' arguments.
+* I18N Portability:: `awk'-level portability issues.
+* I18N Example:: A simple i18n example.
+* Gawk I18N:: `gawk' is also internationalized.
+* Nondecimal Data:: Allowing nondecimal input data.
+* Two-way I/O:: Two-way communications with another
+ process.
+* TCP/IP Networking:: Using `gawk' for network
+ programming.
+* Portal Files:: Using `gawk' with BSD portals.
+* Profiling:: Profiling your `awk' programs.
+* Command Line:: How to run `awk'.
+* Options:: Command-line options and their meanings.
+* Other Arguments:: Input file names and variable assignments.
+* AWKPATH Variable:: Searching directories for `awk'
+ programs.
+* Obsolete:: Obsolete Options and/or features.
+* Undocumented:: Undocumented Options and Features.
+* Known Bugs:: Known Bugs in `gawk'.
+* Library Names:: How to best name private global variables
+ in library functions.
+* General Functions:: Functions that are of general use.
+* Nextfile Function:: Two implementations of a `nextfile'
+ function.
+* Assert Function:: A function for assertions in `awk'
+ programs.
+* Round Function:: A function for rounding if `sprintf'
+ does not do it correctly.
+* Cliff Random Function:: The Cliff Random Number Generator.
+* Ordinal Functions:: Functions for using characters as numbers
+ and vice versa.
+* Join Function:: A function to join an array into a string.
+* Gettimeofday Function:: A function to get formatted times.
+* Data File Management:: Functions for managing command-line data
+ files.
+* Filetrans Function:: A function for handling data file
+ transitions.
+* Rewind Function:: A function for rereading the current file.
+* File Checking:: Checking that data files are readable.
+* Empty Files:: Checking for zero-length files.
+* Ignoring Assigns:: Treating assignments as file names.
+* Getopt Function:: A function for processing command-line
+ arguments.
+* Passwd Functions:: Functions for getting user information.
+* Group Functions:: Functions for getting group information.
+* Running Examples:: How to run these examples.
+* Clones:: Clones of common utilities.
+* Cut Program:: The `cut' utility.
+* Egrep Program:: The `egrep' utility.
+* Id Program:: The `id' utility.
+* Split Program:: The `split' utility.
+* Tee Program:: The `tee' utility.
+* Uniq Program:: The `uniq' utility.
+* Wc Program:: The `wc' utility.
+* Miscellaneous Programs:: Some interesting `awk' programs.
+* Dupword Program:: Finding duplicated words in a document.
+* Alarm Program:: An alarm clock.
+* Translate Program:: A program similar to the `tr'
+ utility.
+* Labels Program:: Printing mailing labels.
+* Word Sorting:: A program to produce a word usage count.
+* History Sorting:: Eliminating duplicate entries from a
+ history file.
+* Extract Program:: Pulling out programs from Texinfo source
+ files.
+* Simple Sed:: A Simple Stream Editor.
+* Igawk Program:: A wrapper for `awk' that includes
+ files.
+* V7/SVR3.1:: The major changes between V7 and System V
+ Release 3.1.
+* SVR4:: Minor changes between System V Releases 3.1
+ and 4.
+* POSIX:: New features from the POSIX standard.
+* BTL:: New features from the Bell Laboratories
+ version of `awk'.
+* POSIX/GNU:: The extensions in `gawk' not in
+ POSIX `awk'.
+* Contributors:: The major contributors to `gawk'.
+* Gawk Distribution:: What is in the `gawk' distribution.
+* Getting:: How to get the distribution.
+* Extracting:: How to extract the distribution.
+* Distribution contents:: What is in the distribution.
+* Unix Installation:: Installing `gawk' under various
+ versions of Unix.
+* Quick Installation:: Compiling `gawk' under Unix.
+* Additional Configuration Options:: Other compile-time options.
+* Configuration Philosophy:: How it's all supposed to work.
+* Non-Unix Installation:: Installation on Other Operating Systems.
+* Amiga Installation:: Installing `gawk' on an Amiga.
+* BeOS Installation:: Installing `gawk' on BeOS.
+* PC Installation:: Installing and Compiling `gawk' on
+ MS-DOS and OS/2.
+* PC Binary Installation:: Installing a prepared distribution.
+* PC Compiling:: Compiling `gawk' for MS-DOS, Windows32,
+ and OS/2.
+* PC Using:: Running `gawk' on MS-DOS, Windows32 and
+ OS/2.
+* PC Dynamic:: Compiling `gawk' for dynamic
+ libraries.
+* Cygwin:: Building and running `gawk' for
+ Cygwin.
+* VMS Installation:: Installing `gawk' on VMS.
+* VMS Compilation:: How to compile `gawk' under VMS.
+* VMS Installation Details:: How to install `gawk' under VMS.
+* VMS Running:: How to run `gawk' under VMS.
+* VMS POSIX:: Alternate instructions for VMS POSIX.
+* VMS Old Gawk:: An old version comes with some VMS systems.
+* Unsupported:: Systems whose ports are no longer
+ supported.
+* Atari Installation:: Installing `gawk' on the Atari ST.
+* Atari Compiling:: Compiling `gawk' on Atari.
+* Atari Using:: Running `gawk' on Atari.
+* Tandem Installation:: Installing `gawk' on a Tandem.
+* Bugs:: Reporting Problems and Bugs.
+* Other Versions:: Other freely available `awk'
+ implementations.
+* Compatibility Mode:: How to disable certain `gawk'
+ extensions.
+* Additions:: Making Additions To `gawk'.
+* Adding Code:: Adding code to the main body of
+ `gawk'.
+* New Ports:: Porting `gawk' to a new operating
+ system.
+* Dynamic Extensions:: Adding new built-in functions to
+ `gawk'.
+* Internals:: A brief look at some `gawk'
+ internals.
+* Sample Library:: A example of new functions.
+* Internal File Description:: What the new functions will do.
+* Internal File Ops:: The code for internal file operations.
+* Using Internal File Ops:: How to use an external extension.
+* Future Extensions:: New features that may be implemented one
+ day.
+* Basic High Level:: The high level view.
+* Basic Data Typing:: A very quick intro to data types.
+* Floating Point Issues:: Stuff to know about floating-point numbers.
+* String Conversion Precision:: The String Value Can Lie.
+* Unexpected Results:: Floating Point Numbers Are Not
+ Abstract Numbers.
+* POSIX Floating Point Problems:: Standards Versus Existing Practice.
+
+ To Miriam, for making me complete.
+
+ To Chana, for the joy you bring us.
+
+ To Rivka, for the exponential increase.
+
+ To Nachum, for the added dimension.
+
+ To Malka, for the new beginning.
+
+File: gawk.info, Node: Foreword, Next: Preface, Prev: Top, Up: Top
+
+Foreword
+********
+
+Arnold Robbins and I are good friends. We were introduced 11 years ago
+by circumstances--and our favorite programming language, AWK. The
+circumstances started a couple of years earlier. I was working at a new
+job and noticed an unplugged Unix computer sitting in the corner. No
+one knew how to use it, and neither did I. However, a couple of days
+later it was running, and I was `root' and the one-and-only user. That
+day, I began the transition from statistician to Unix programmer.
+
+ On one of many trips to the library or bookstore in search of books
+on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and
+Weinberger, `The AWK Programming Language', Addison-Wesley, 1988.
+AWK's simple programming paradigm--find a pattern in the input and then
+perform an action--often reduced complex or tedious data manipulations
+to few lines of code. I was excited to try my hand at programming in
+AWK.
+
+ Alas, the `awk' on my computer was a limited version of the
+language described in the AWK book. I discovered that my computer had
+"old `awk'" and the AWK book described "new `awk'." I learned that
+this was typical; the old version refused to step aside or relinquish
+its name. If a system had a new `awk', it was invariably called
+`nawk', and few systems had it. The best way to get a new `awk' was to
+`ftp' the source code for `gawk' from `prep.ai.mit.edu'. `gawk' was a
+version of new `awk' written by David Trueman and Arnold, and available
+under the GNU General Public License.
+
+ (Incidentally, it's no longer difficult to find a new `awk'. `gawk'
+ships with Linux, and you can download binaries or source code for
+almost any system; my wife uses `gawk' on her VMS box.)
+
+ My Unix system started out unplugged from the wall; it certainly was
+not plugged into a network. So, oblivious to the existence of `gawk'
+and the Unix community in general, and desiring a new `awk', I wrote my
+own, called `mawk'. Before I was finished I knew about `gawk', but it
+was too late to stop, so I eventually posted to a `comp.sources'
+newsgroup.
+
+ A few days after my posting, I got a friendly email from Arnold
+introducing himself. He suggested we share design and algorithms and
+attached a draft of the POSIX standard so that I could update `mawk' to
+support language extensions added after publication of the AWK book.
+
+ Frankly, if our roles had been reversed, I would not have been so
+open and we probably would have never met. I'm glad we did meet. He
+is an AWK expert's AWK expert and a genuinely nice person. Arnold
+contributes significant amounts of his expertise and time to the Free
+Software Foundation.
+
+ This book is the `gawk' reference manual, but at its core it is a
+book about AWK programming that will appeal to a wide audience. It is
+a definitive reference to the AWK language as defined by the 1987 Bell
+Labs release and codified in the 1992 POSIX Utilities standard.
+
+ On the other hand, the novice AWK programmer can study a wealth of
+practical programs that emphasize the power of AWK's basic idioms: data
+driven control-flow, pattern matching with regular expressions, and
+associative arrays. Those looking for something new can try out
+`gawk''s interface to network protocols via special `/inet' files.
+
+ The programs in this book make clear that an AWK program is
+typically much smaller and faster to develop than a counterpart written
+in C. Consequently, there is often a payoff to prototype an algorithm
+or design in AWK to get it running quickly and expose problems early.
+Often, the interpreted performance is adequate and the AWK prototype
+becomes the product.
+
+ The new `pgawk' (profiling `gawk'), produces program execution
+counts. I recently experimented with an algorithm that for n lines of
+input, exhibited ~ C n^2 performance, while theory predicted ~ C n log n
+behavior. A few minutes poring over the `awkprof.out' profile
+pinpointed the problem to a single line of code. `pgawk' is a welcome
+addition to my programmer's toolbox.
+
+ Arnold has distilled over a decade of experience writing and using
+AWK programs, and developing `gawk', into this book. If you use AWK or
+want to learn how, then read this book.
+
+ Michael Brennan
+ Author of `mawk'
+
+
+File: gawk.info, Node: Preface, Next: Getting Started, Prev: Foreword, Up: Top
+
+Preface
+*******
+
+Several kinds of tasks occur repeatedly when working with text files.
+You might want to extract certain lines and discard the rest. Or you
+may need to make changes wherever certain patterns appear, but leave
+the rest of the file alone. Writing single-use programs for these
+tasks in languages such as C, C++, or Pascal is time-consuming and
+inconvenient. Such jobs are often easier with `awk'. The `awk'
+utility interprets a special-purpose programming language that makes it
+easy to handle simple data-reformatting jobs.
+
+ The GNU implementation of `awk' is called `gawk'; it is fully
+compatible with the System V Release 4 version of `awk'. `gawk' is
+also compatible with the POSIX specification of the `awk' language.
+This means that all properly written `awk' programs should work with
+`gawk'. Thus, we usually don't distinguish between `gawk' and other
+`awk' implementations.
+
+ Using `awk' allows you to:
+
+ * Manage small, personal databases
+
+ * Generate reports
+
+ * Validate data
+
+ * Produce indexes and perform other document preparation tasks
+
+ * Experiment with algorithms that you can adapt later to other
+ computer languages
+
+ In addition, `gawk' provides facilities that make it easy to:
+
+ * Extract bits and pieces of data for processing
+
+ * Sort data
+
+ * Perform simple network communications
+
+ This Info file teaches you about the `awk' language and how you can
+use it effectively. You should already be familiar with basic system
+commands, such as `cat' and `ls',(1) as well as basic shell facilities,
+such as input/output (I/O) redirection and pipes.
+
+ Implementations of the `awk' language are available for many
+different computing environments. This Info file, while describing the
+`awk' language in general, also describes the particular implementation
+of `awk' called `gawk' (which stands for "GNU awk"). `gawk' runs on a
+broad range of Unix systems, ranging from 80386 PC-based computers up
+through large-scale systems, such as Crays. `gawk' has also been ported
+to Mac OS X, MS-DOS, Microsoft Windows (all versions) and OS/2 PCs,
+Atari and Amiga microcomputers, BeOS, Tandem D20, and VMS.
+
+* Menu:
+
+* History:: The history of `gawk' and
+ `awk'.
+* Names:: What name to use to find `awk'.
+* This Manual:: Using this Info file. Includes sample
+ input files that you can use.
+* Conventions:: Typographical Conventions.
+* Manual History:: Brief history of the GNU project and this
+ Info file.
+* How To Contribute:: Helping to save the world.
+* Acknowledgments:: Acknowledgments.
+
+ ---------- Footnotes ----------
+
+ (1) These commands are available on POSIX-compliant systems, as well
+as on traditional Unix-based systems. If you are using some other
+operating system, you still need to be familiar with the ideas of I/O
+redirection and pipes.
+
+
+File: gawk.info, Node: History, Next: Names, Up: Preface
+
+History of `awk' and `gawk'
+===========================
+
+ Recipe For A Programming Language
+
+ 1 part `egrep' 1 part `snobol'
+ 2 parts `ed' 3 parts C
+
+ Blend all parts well using `lex' and `yacc'. Document minimally
+ and release.
+
+ After eight years, add another part `egrep' and two more parts C.
+ Document very well and release.
+
+ The name `awk' comes from the initials of its designers: Alfred V.
+Aho, Peter J. Weinberger and Brian W. Kernighan. The original version
+of `awk' was written in 1977 at AT&T Bell Laboratories. In 1985, a new
+version made the programming language more powerful, introducing
+user-defined functions, multiple input streams, and computed regular
+expressions. This new version became widely available with Unix System
+V Release 3.1 (SVR3.1). The version in SVR4 added some new features
+and cleaned up the behavior in some of the "dark corners" of the
+language. The specification for `awk' in the POSIX Command Language
+and Utilities standard further clarified the language. Both the `gawk'
+designers and the original Bell Laboratories `awk' designers provided
+feedback for the POSIX specification.
+
+ Paul Rubin wrote the GNU implementation, `gawk', in 1986. Jay
+Fenlason completed it, with advice from Richard Stallman. John Woods
+contributed parts of the code as well. In 1988 and 1989, David
+Trueman, with help from me, thoroughly reworked `gawk' for compatibility
+with the newer `awk'. Circa 1995, I became the primary maintainer.
+Current development focuses on bug fixes, performance improvements,
+standards compliance, and occasionally, new features.
+
+ In May of 1997, Ju"rgen Kahrs felt the need for network access from
+`awk', and with a little help from me, set about adding features to do
+this for `gawk'. At that time, he also wrote the bulk of `TCP/IP
+Internetworking with `gawk'' (a separate document, available as part of
+the `gawk' distribution). His code finally became part of the main
+`gawk' distribution with `gawk' version 3.1.
+
+ *Note Contributors::, for a complete list of those who made
+important contributions to `gawk'.
+
+
+File: gawk.info, Node: Names, Next: This Manual, Prev: History, Up: Preface
+
+A Rose by Any Other Name
+========================
+
+The `awk' language has evolved over the years. Full details are
+provided in *note Language History::. The language described in this
+Info file is often referred to as "new `awk'" (`nawk').
+
+ Because of this, many systems have multiple versions of `awk'. Some
+systems have an `awk' utility that implements the original version of
+the `awk' language and a `nawk' utility for the new version. Others
+have an `oawk' version for the "old `awk'" language and plain `awk' for
+the new one. Still others only have one version, which is usually the
+new one.(1)
+
+ All in all, this makes it difficult for you to know which version of
+`awk' you should run when writing your programs. The best advice I can
+give here is to check your local documentation. Look for `awk', `oawk',
+and `nawk', as well as for `gawk'. It is likely that you already have
+some version of new `awk' on your system, which is what you should use
+when running your programs. (Of course, if you're reading this Info
+file, chances are good that you have `gawk'!)
+
+ Throughout this Info file, whenever we refer to a language feature
+that should be available in any complete implementation of POSIX `awk',
+we simply use the term `awk'. When referring to a feature that is
+specific to the GNU implementation, we use the term `gawk'.
+
+ ---------- Footnotes ----------
+
+ (1) Often, these systems use `gawk' for their `awk' implementation!
+
+
+File: gawk.info, Node: This Manual, Next: Conventions, Prev: Names, Up: Preface
+
+Using This Book
+===============
+
+The term `awk' refers to a particular program as well as to the
+language you use to tell this program what to do. When we need to be
+careful, we call the language "the `awk' language," and the program
+"the `awk' utility." This Info file explains both the `awk' language
+and how to run the `awk' utility. The term "`awk' program" refers to a
+program written by you in the `awk' programming language.
+
+ Primarily, this Info file explains the features of `awk', as defined
+in the POSIX standard. It does so in the context of the `gawk'
+implementation. While doing so, it also attempts to describe important
+differences between `gawk' and other `awk' implementations.(1) Finally,
+any `gawk' features that are not in the POSIX standard for `awk' are
+noted.
+
+ There are subsections labelled as *Advanced Notes* scattered
+throughout the Info file. They add a more complete explanation of
+points that are relevant, but not likely to be of interest on first
+reading. All appear in the index, under the heading "advanced
+features."
+
+ Most of the time, the examples use complete `awk' programs. In some
+of the more advanced sections, only the part of the `awk' program that
+illustrates the concept currently being described is shown.
+
+ While this Info file is aimed principally at people who have not been
+exposed to `awk', there is a lot of information here that even the `awk'
+expert should find useful. In particular, the description of POSIX
+`awk' and the example programs in *note Library Functions::, and in
+*note Sample Programs::, should be of interest.
+
+ *note Getting Started::, provides the essentials you need to know to
+begin using `awk'.
+
+ *note Regexp::, introduces regular expressions in general, and in
+particular the flavors supported by POSIX `awk' and `gawk'.
+
+ *note Reading Files::, describes how `awk' reads your data. It
+introduces the concepts of records and fields, as well as the `getline'
+command. I/O redirection is first described here.
+
+ *note Printing::, describes how `awk' programs can produce output
+with `print' and `printf'.
+
+ *note Expressions::, describes expressions, which are the basic
+building blocks for getting most things done in a program.
+
+ *note Patterns and Actions::, describes how to write patterns for
+matching records, actions for doing something when a record is matched,
+and the built-in variables `awk' and `gawk' use.
+
+ *note Arrays::, covers `awk''s one-and-only data structure:
+associative arrays. Deleting array elements and whole arrays is also
+described, as well as sorting arrays in `gawk'.
+
+ *note Functions::, describes the built-in functions `awk' and `gawk'
+provide, as well as how to define your own functions.
+
+ *note Internationalization::, describes special features in `gawk'
+for translating program messages into different languages at runtime.
+
+ *note Advanced Features::, describes a number of `gawk'-specific
+advanced features. Of particular note are the abilities to have
+two-way communications with another process, perform TCP/IP networking,
+and profile your `awk' programs.
+
+ *note Invoking Gawk::, describes how to run `gawk', the meaning of
+its command-line options, and how it finds `awk' program source files.
+
+ *note Library Functions::, and *note Sample Programs::, provide many
+sample `awk' programs. Reading them allows you to see `awk' solving
+real problems.
+
+ *note Language History::, describes how the `awk' language has
+evolved since first release to present. It also describes how `gawk'
+has acquired features over time.
+
+ *note Installation::, describes how to get `gawk', how to compile it
+under Unix, and how to compile and use it on different non-Unix
+systems. It also describes how to report bugs in `gawk' and where to
+get three other freely available implementations of `awk'.
+
+ *note Notes::, describes how to disable `gawk''s extensions, as well
+as how to contribute new code to `gawk', how to write extension
+libraries, and some possible future directions for `gawk' development.
+
+ *note Basic Concepts::, provides some very cursory background
+material for those who are completely unfamiliar with computer
+programming. Also centralized there is a discussion of some of the
+issues surrounding floating-point numbers.
+
+ The *note Glossary::, defines most, if not all, the significant
+terms used throughout the book. If you find terms that you aren't
+familiar with, try looking them up here.
+
+ *note Copying::, and *note GNU Free Documentation License::, present
+the licenses that cover the `gawk' source code and this Info file,
+respectively.
+
+ ---------- Footnotes ----------
+
+ (1) All such differences appear in the index under the entry
+"differences in `awk' and `gawk'."
+
+
+File: gawk.info, Node: Conventions, Next: Manual History, Prev: This Manual, Up: Preface
+
+Typographical Conventions
+=========================
+
+This Info file is written using Texinfo, the GNU documentation
+formatting language. A single Texinfo source file is used to produce
+both the printed and online versions of the documentation. This minor
+node briefly documents the typographical conventions used in Texinfo.
+
+ Examples you would type at the command-line are preceded by the
+common shell primary and secondary prompts, `$' and `>'. Output from
+the command is preceded by the glyph "-|". This typically represents
+the command's standard output. Error messages, and other output on the
+command's standard error, are preceded by the glyph "error-->". For
+example:
+
+ $ echo hi on stdout
+ -| hi on stdout
+ $ echo hello on stderr 1>&2
+ error--> hello on stderr
+
+ Characters that you type at the keyboard look `like this'. In
+particular, there are special characters called "control characters."
+These are characters that you type by holding down both the `CONTROL'
+key and another key, at the same time. For example, a `Ctrl-d' is typed
+by first pressing and holding the `CONTROL' key, next pressing the `d'
+key and finally releasing both keys.
+
+Dark Corners
+............
+
+ Dark corners are basically fractal -- no matter how much you
+ illuminate, there's always a smaller but darker one.
+ Brian Kernighan
+
+ Until the POSIX standard (and `The Gawk Manual'), many features of
+`awk' were either poorly documented or not documented at all.
+Descriptions of such features (often called "dark corners") are noted
+in this Info file with "(d.c.)". They also appear in the index under
+the heading "dark corner."
+
+ As noted by the opening quote, though, any coverage of dark corners
+is, by definition, something that is incomplete.
+
+
+File: gawk.info, Node: Manual History, Next: How To Contribute, Prev: Conventions, Up: Preface
+
+The GNU Project and This Book
+=============================
+
+The Free Software Foundation (FSF) is a nonprofit organization dedicated
+to the production and distribution of freely distributable software.
+It was founded by Richard M. Stallman, the author of the original Emacs
+editor. GNU Emacs is the most widely used version of Emacs today.
+
+ The GNU(1) Project is an ongoing effort on the part of the Free
+Software Foundation to create a complete, freely distributable,
+POSIX-compliant computing environment. The FSF uses the "GNU General
+Public License" (GPL) to ensure that their software's source code is
+always available to the end user. A copy of the GPL is included for
+your reference (*note Copying::). The GPL applies to the C language
+source code for `gawk'. To find out more about the FSF and the GNU
+Project online, see the GNU Project's home page (http://www.gnu.org).
+This Info file may also be read from their web site
+(http://www.gnu.org/software/gawk/manual/).
+
+ A shell, an editor (Emacs), highly portable optimizing C, C++, and
+Objective-C compilers, a symbolic debugger and dozens of large and
+small utilities (such as `gawk'), have all been completed and are
+freely available. The GNU operating system kernel (the HURD), has been
+released but is still in an early stage of development.
+
+ Until the GNU operating system is more fully developed, you should
+consider using GNU/Linux, a freely distributable, Unix-like operating
+system for Intel 80386, DEC Alpha, Sun SPARC, IBM S/390, and other
+systems.(2) There are many books on GNU/Linux. One that is freely
+available is `Linux Installation and Getting Started', by Matt Welsh.
+Many GNU/Linux distributions are often available in computer stores or
+bundled on CD-ROMs with books about Linux. (There are three other
+freely available, Unix-like operating systems for 80386 and other
+systems: NetBSD, FreeBSD, and OpenBSD. All are based on the 4.4-Lite
+Berkeley Software Distribution, and they use recent versions of `gawk'
+for their versions of `awk'.)
+
+ The Info file itself has gone through a number of previous editions.
+Paul Rubin wrote the very first draft of `The GAWK Manual'; it was
+around 40 pages in size. Diane Close and Richard Stallman improved it,
+yielding a version that was around 90 pages long and barely described
+the original, "old" version of `awk'.
+
+ I started working with that version in the fall of 1988. As work on
+it progressed, the FSF published several preliminary versions (numbered
+0.X). In 1996, Edition 1.0 was released with `gawk' 3.0.0. The FSF
+published the first two editions under the title `The GNU Awk User's
+Guide'.
+
+ This edition maintains the basic structure of Edition 1.0, but with
+significant additional material, reflecting the host of new features in
+`gawk' version 3.1. Of particular note is *note Array Sorting::, as
+well as *note Bitwise Functions::, *note Internationalization::, and
+also *note Advanced Features::, and *note Dynamic Extensions::.
+
+ `GAWK: Effective AWK Programming' will undoubtedly continue to
+evolve. An electronic version comes with the `gawk' distribution from
+the FSF. If you find an error in this Info file, please report it!
+*Note Bugs::, for information on submitting problem reports
+electronically, or write to me in care of the publisher.
+
+ ---------- Footnotes ----------
+
+ (1) GNU stands for "GNU's not Unix."
+
+ (2) The terminology "GNU/Linux" is explained in the *note Glossary::.
+
+
+File: gawk.info, Node: How To Contribute, Next: Acknowledgments, Prev: Manual History, Up: Preface
+
+How to Contribute
+=================
+
+As the maintainer of GNU `awk', I am starting a collection of publicly
+available `awk' programs. For more information, see
+`ftp://ftp.freefriends.org/arnold/Awkstuff'. If you have written an
+interesting `awk' program, or have written a `gawk' extension that you
+would like to share with the rest of the world, please contact me
+( To run it, do this: \ Details of HTTP come from:Hello, world
"
+ Len = length(Hello) + length(ORS)
+ print "HTTP/1.0 200 OK" |& HttpService
+ print "Content-Length: " Len ORS |& HttpService
+ print Hello |& HttpService
+ while ((HttpService |& getline) > 0)
+ continue;
+ close(HttpService)
+ }
+
+ Now, on the same machine, start your favorite browser and let it
+point to `http://localhost:8080' (the browser needs to know on which
+port our server is listening for requests). If this does not work, the
+browser probably tries to connect to a proxy server that does not know
+your machine. If so, change the browser's configuration so that the
+browser does not try to use a proxy to connect to your machine.
+
+
+File: gawkinet.info, Node: Interacting Service, Next: Simple Server, Prev: Primitive Service, Up: Using Networking
+
+2.9 A Web Service with Interaction
+==================================
+
+This node shows how to set up a simple web server. The subnode is a
+library file that we will use with all the examples in *note Some
+Applications and Techniques::.
+
+* Menu:
+
+* CGI Lib:: A simple CGI library.
+
+ Setting up a web service that allows user interaction is more
+difficult and shows us the limits of network access in `gawk'. In this
+node, we develop a main program (a `BEGIN' pattern and its action)
+that will become the core of event-driven execution controlled by a
+graphical user interface (GUI). Each HTTP event that the user triggers
+by some action within the browser is received in this central
+procedure. Parameters and menu choices are extracted from this request,
+and an appropriate measure is taken according to the user's choice.
+For example:
+
+ BEGIN {
+ if (MyHost == "") {
+ "uname -n" | getline MyHost
+ close("uname -n")
+ }
+ if (MyPort == 0) MyPort = 8080
+ HttpService = "/inet/tcp/" MyPort "/0/0"
+ MyPrefix = "http://" MyHost ":" MyPort
+ SetUpServer()
+ while ("awk" != "complex") {
+ # header lines are terminated this way
+ RS = ORS = "\r\n"
+ Status = 200 # this means OK
+ Reason = "OK"
+ Header = TopHeader
+ Document = TopDoc
+ Footer = TopFooter
+ if (GETARG["Method"] == "GET") {
+ HandleGET()
+ } else if (GETARG["Method"] == "HEAD") {
+ # not yet implemented
+ } else if (GETARG["Method"] != "") {
+ print "bad method", GETARG["Method"]
+ }
+ Prompt = Header Document Footer
+ print "HTTP/1.0", Status, Reason |& HttpService
+ print "Connection: Close" |& HttpService
+ print "Pragma: no-cache" |& HttpService
+ len = length(Prompt) + length(ORS)
+ print "Content-length:", len |& HttpService
+ print ORS Prompt |& HttpService
+ # ignore all the header lines
+ while ((HttpService |& getline) > 0)
+ ;
+ # stop talking to this client
+ close(HttpService)
+ # wait for new client request
+ HttpService |& getline
+ # do some logging
+ print systime(), strftime(), $0
+ # read request parameters
+ CGI_setup($1, $2, $3)
+ }
+ }
+
+ This web server presents menu choices in the form of HTML links.
+Therefore, it has to tell the browser the name of the host it is
+residing on. When starting the server, the user may supply the name of
+the host from the command line with `gawk -v MyHost="Rumpelstilzchen"'.
+If the user does not do this, the server looks up the name of the host
+it is running on for later use as a web address in HTML documents. The
+same applies to the port number. These values are inserted later into
+the HTML content of the web pages to refer to the home system.
+
+ Each server that is built around this core has to initialize some
+application-dependent variables (such as the default home page) in a
+procedure `SetUpServer', which is called immediately before entering the
+infinite loop of the server. For now, we will write an instance that
+initiates a trivial interaction. With this home page, the client user
+can click on two possible choices, and receive the current date either
+in human-readable format or in seconds since 1970:
+
+ function SetUpServer() {
+ TopHeader = ""
+ TopHeader = TopHeader \
+ "\
+ Do you prefer your date human or \
+ POSIXed?
" ORS ORS
+ TopFooter = ""
+ }
+
+ On the first run through the main loop, the default line terminators
+are set and the default home page is copied to the actual home page.
+Since this is the first run, `GETARG["Method"]' is not initialized yet,
+hence the case selection over the method does nothing. Now that the
+home page is initialized, the server can start communicating to a
+client browser.
+
+ It does so by printing the HTTP header into the network connection
+(`print ... |& HttpService'). This command blocks execution of the
+server script until a client connects. If this server script is
+compared with the primitive one we wrote before, you will notice two
+additional lines in the header. The first instructs the browser to
+close the connection after each request. The second tells the browser
+that it should never try to _remember_ earlier requests that had
+identical web addresses (no caching). Otherwise, it could happen that
+the browser retrieves the time of day in the previous example just once,
+and later it takes the web page from the cache, always displaying the
+same time of day although time advances each second.
+
+ Having supplied the initial home page to the browser with a valid
+document stored in the parameter `Prompt', it closes the connection and
+waits for the next request. When the request comes, a log line is
+printed that allows us to see which request the server receives. The
+final step in the loop is to call the function `CGI_setup', which reads
+all the lines of the request (coming from the browser), processes them,
+and stores the transmitted parameters in the array `PARAM'. The complete
+text of these application-independent functions can be found in *note A
+Simple CGI Library: CGI Lib. For now, we use a simplified version of
+`CGI_setup':
+
+ function CGI_setup( method, uri, version, i) {
+ delete GETARG; delete MENU; delete PARAM
+ GETARG["Method"] = $1
+ GETARG["URI"] = $2
+ GETARG["Version"] = $3
+ i = index($2, "?")
+ # is there a "?" indicating a CGI request?
+ if (i > 0) {
+ split(substr($2, 1, i-1), MENU, "[/:]")
+ split(substr($2, i+1), PARAM, "&")
+ for (i in PARAM) {
+ j = index(PARAM[i], "=")
+ GETARG[substr(PARAM[i], 1, j-1)] = \
+ substr(PARAM[i], j+1)
+ }
+ } else { # there is no "?", no need for splitting PARAMs
+ split($2, MENU, "[/:]")
+ }
+ }
+
+ At first, the function clears all variables used for global storage
+of request parameters. The rest of the function serves the purpose of
+filling the global parameters with the extracted new values. To
+accomplish this, the name of the requested resource is split into parts
+and stored for later evaluation. If the request contains a `?', then
+the request has CGI variables seamlessly appended to the web address.
+Everything in front of the `?' is split up into menu items, and
+everything behind the `?' is a list of `VARIABLE=VALUE' pairs
+(separated by `&') that also need splitting. This way, CGI variables are
+isolated and stored. This procedure lacks recognition of special
+characters that are transmitted in coded form(1). Here, any optional
+request header and body parts are ignored. We do not need header
+parameters and the request body. However, when refining our approach or
+working with the `POST' and `PUT' methods, reading the header and body
+becomes inevitable. Header parameters should then be stored in a global
+array as well as the body.
+
+ On each subsequent run through the main loop, one request from a
+browser is received, evaluated, and answered according to the user's
+choice. This can be done by letting the value of the HTTP method guide
+the main loop into execution of the procedure `HandleGET', which
+evaluates the user's choice. In this case, we have only one
+hierarchical level of menus, but in the general case, menus are nested.
+The menu choices at each level are separated by `/', just as in file
+names. Notice how simple it is to construct menus of arbitrary depth:
+
+ function HandleGET() {
+ if ( MENU[2] == "human") {
+ Footer = strftime() TopFooter
+ } else if (MENU[2] == "POSIX") {
+ Footer = systime() TopFooter
+ }
+ }
+
+ The disadvantage of this approach is that our server is slow and can
+handle only one request at a time. Its main advantage, however, is that
+the server consists of just one `gawk' program. No need for installing
+an `httpd', and no need for static separate HTML files, CGI scripts, or
+`root' privileges. This is rapid prototyping. This program can be
+started on the same host that runs your browser. Then let your browser
+point to `http://localhost:8080'.
+
+ It is also possible to include images into the HTML pages. Most
+browsers support the not very well-known `.xbm' format, which may
+contain only monochrome pictures but is an ASCII format. Binary images
+are possible but not so easy to handle. Another way of including images
+is to generate them with a tool such as GNUPlot, by calling the tool
+with the `system' function or through a pipe.
+
+ ---------- Footnotes ----------
+
+ (1) As defined in RFC 2068.
+
+
+File: gawkinet.info, Node: CGI Lib, Prev: Interacting Service, Up: Interacting Service
+
+2.9.1 A Simple CGI Library
+--------------------------
+
+ HTTP is like being married: you have to be able to handle whatever
+ you're given, while being very careful what you send back.
+ Phil Smith III,
+ `http://www.netfunny.com/rhf/jokes/99/Mar/http.html'
+
+ In *note A Web Service with Interaction: Interacting Service, we saw
+the function `CGI_setup' as part of the web server "core logic"
+framework. The code presented there handles almost everything necessary
+for CGI requests. One thing it doesn't do is handle encoded characters
+in the requests. For example, an `&' is encoded as a percent sign
+followed by the hexadecimal value: `%26'. These encoded values should
+be decoded. Following is a simple library to perform these tasks.
+This code is used for all web server examples used throughout the rest
+of this Info file. If you want to use it for your own web server,
+store the source code into a file named `inetlib.awk'. Then you can
+include these functions into your code by placing the following
+statement into your program (on the first line of your script):
+
+ @include inetlib.awk
+
+But beware, this mechanism is only possible if you invoke your web
+server script with `igawk' instead of the usual `awk' or `gawk'. Here
+is the code:
+
+ # CGI Library and core of a web server
+ # Global arrays
+ # GETARG --- arguments to CGI GET command
+ # MENU --- menu items (path names)
+ # PARAM --- parameters of form x=y
+
+ # Optional variable MyHost contains host address
+ # Optional variable MyPort contains port number
+ # Needs TopHeader, TopDoc, TopFooter
+ # Sets MyPrefix, HttpService, Status, Reason
+
+ BEGIN {
+ if (MyHost == "") {
+ "uname -n" | getline MyHost
+ close("uname -n")
+ }
+ if (MyPort == 0) MyPort = 8080
+ HttpService = "/inet/tcp/" MyPort "/0/0"
+ MyPrefix = "http://" MyHost ":" MyPort
+ SetUpServer()
+ while ("awk" != "complex") {
+ # header lines are terminated this way
+ RS = ORS = "\r\n"
+ Status = 200 # this means OK
+ Reason = "OK"
+ Header = TopHeader
+ Document = TopDoc
+ Footer = TopFooter
+ if (GETARG["Method"] == "GET") {
+ HandleGET()
+ } else if (GETARG["Method"] == "HEAD") {
+ # not yet implemented
+ } else if (GETARG["Method"] != "") {
+ print "bad method", GETARG["Method"]
+ }
+ Prompt = Header Document Footer
+ print "HTTP/1.0", Status, Reason |& HttpService
+ print "Connection: Close" |& HttpService
+ print "Pragma: no-cache" |& HttpService
+ len = length(Prompt) + length(ORS)
+ print "Content-length:", len |& HttpService
+ print ORS Prompt |& HttpService
+ # ignore all the header lines
+ while ((HttpService |& getline) > 0)
+ continue
+ # stop talking to this client
+ close(HttpService)
+ # wait for new client request
+ HttpService |& getline
+ # do some logging
+ print systime(), strftime(), $0
+ CGI_setup($1, $2, $3)
+ }
+ }
+
+ function CGI_setup( method, uri, version, i)
+ {
+ delete GETARG
+ delete MENU
+ delete PARAM
+ GETARG["Method"] = method
+ GETARG["URI"] = uri
+ GETARG["Version"] = version
+
+ i = index(uri, "?")
+ if (i > 0) { # is there a "?" indicating a CGI request?
+ split(substr(uri, 1, i-1), MENU, "[/:]")
+ split(substr(uri, i+1), PARAM, "&")
+ for (i in PARAM) {
+ PARAM[i] = _CGI_decode(PARAM[i])
+ j = index(PARAM[i], "=")
+ GETARG[substr(PARAM[i], 1, j-1)] = \
+ substr(PARAM[i], j+1)
+ }
+ } else { # there is no "?", no need for splitting PARAMs
+ split(uri, MENU, "[/:]")
+ }
+ for (i in MENU) # decode characters in path
+ if (i > 4) # but not those in host name
+ MENU[i] = _CGI_decode(MENU[i])
+ }
+
+ This isolates details in a single function, `CGI_setup'. Decoding
+of encoded characters is pushed off to a helper function,
+`_CGI_decode'. The use of the leading underscore (`_') in the function
+name is intended to indicate that it is an "internal" function,
+although there is nothing to enforce this:
+
+ function _CGI_decode(str, hexdigs, i, pre, code1, code2,
+ val, result)
+ {
+ hexdigs = "123456789abcdef"
+
+ i = index(str, "%")
+ if (i == 0) # no work to do
+ return str
+
+ do {
+ pre = substr(str, 1, i-1) # part before %xx
+ code1 = substr(str, i+1, 1) # first hex digit
+ code2 = substr(str, i+2, 1) # second hex digit
+ str = substr(str, i+3) # rest of string
+
+ code1 = tolower(code1)
+ code2 = tolower(code2)
+ val = index(hexdigs, code1) * 16 \
+ + index(hexdigs, code2)
+
+ result = result pre sprintf("%c", val)
+ i = index(str, "%")
+ } while (i != 0)
+ if (length(str) > 0)
+ result = result str
+ return result
+ }
+
+ This works by splitting the string apart around an encoded character.
+The two digits are converted to lowercase characters and looked up in a
+string of hex digits. Note that `0' is not in the string on purpose;
+`index' returns zero when it's not found, automatically giving the
+correct value! Once the hexadecimal value is converted from characters
+in a string into a numerical value, `sprintf' converts the value back
+into a real character. The following is a simple test harness for the
+above functions:
+
+ BEGIN {
+ CGI_setup("GET",
+ "http://www.gnu.org/cgi-bin/foo?p1=stuff&p2=stuff%26junk" \
+ "&percent=a %25 sign",
+ "1.0")
+ for (i in MENU)
+ printf "MENU[\"%s\"] = %s\n", i, MENU[i]
+ for (i in PARAM)
+ printf "PARAM[\"%s\"] = %s\n", i, PARAM[i]
+ for (i in GETARG)
+ printf "GETARG[\"%s\"] = %s\n", i, GETARG[i]
+ }
+
+ And this is the result when we run it:
+
+ $ gawk -f testserv.awk
+ -| MENU["4"] = www.gnu.org
+ -| MENU["5"] = cgi-bin
+ -| MENU["6"] = foo
+ -| MENU["1"] = http
+ -| MENU["2"] =
+ -| MENU["3"] =
+ -| PARAM["1"] = p1=stuff
+ -| PARAM["2"] = p2=stuff&junk
+ -| PARAM["3"] = percent=a % sign
+ -| GETARG["p1"] = stuff
+ -| GETARG["percent"] = a % sign
+ -| GETARG["p2"] = stuff&junk
+ -| GETARG["Method"] = GET
+ -| GETARG["Version"] = 1.0
+ -| GETARG["URI"] = http://www.gnu.org/cgi-bin/foo?p1=stuff&
+ p2=stuff%26junk&percent=a %25 sign
+
+
+File: gawkinet.info, Node: Simple Server, Next: Caveats, Prev: Interacting Service, Up: Using Networking
+
+2.10 A Simple Web Server
+========================
+
+In the preceding node, we built the core logic for event-driven GUIs.
+In this node, we finally extend the core to a real application. No one
+would actually write a commercial web server in `gawk', but it is
+instructive to see that it is feasible in principle.
+
+ The application is ELIZA, the famous program by Joseph Weizenbaum
+that mimics the behavior of a professional psychotherapist when talking
+to you. Weizenbaum would certainly object to this description, but
+this is part of the legend around ELIZA. Take the site-independent
+core logic and append the following code:
+
+ function SetUpServer() {
+ SetUpEliza()
+ TopHeader = \
+ "Please choose one of the following actions:
\
+ \
+
"
+ TopFooter = ""
+ }
+
+ `SetUpServer' is similar to the previous example, except for calling
+another function, `SetUpEliza'. This approach can be used to implement
+other kinds of servers. The only changes needed to do so are hidden in
+the functions `SetUpServer' and `HandleGET'. Perhaps it might be
+necessary to implement other HTTP methods. The `igawk' program that
+comes with `gawk' may be useful for this process.
+
+ When extending this example to a complete application, the first
+thing to do is to implement the function `SetUpServer' to initialize
+the HTML pages and some variables. These initializations determine the
+way your HTML pages look (colors, titles, menu items, etc.).
+
+ The function `HandleGET' is a nested case selection that decides
+which page the user wants to see next. Each nesting level refers to a
+menu level of the GUI. Each case implements a certain action of the
+menu. On the deepest level of case selection, the handler essentially
+knows what the user wants and stores the answer into the variable that
+holds the HTML page contents:
+
+ function HandleGET() {
+ # A real HTTP server would treat some parts of the URI as a file name.
+ # We take parts of the URI as menu choices and go on accordingly.
+ if(MENU[2] == "AboutServer") {
+ Document = "This is not a CGI script.\
+ This is an httpd, an HTML file, and a CGI script all \
+ in one GAWK script. It needs no separate www-server, \
+ no installation, and no root privileges.\
+ \
+
\\
+
JK 14.9.1997
" + } else if (MENU[2] == "AboutELIZA") { + Document = "This is an implementation of the famous ELIZA\ + program by Joseph Weizenbaum. It is written in GAWK and\ + /bin/sh: expad: command not found + } else if (MENU[2] == "StartELIZA") { + gsub(/\+/, " ", GETARG["YouSay"]) + # Here we also have to substitute coded special characters + Document = "" + } + } + + Now we are down to the heart of ELIZA, so you can see how it works. +Initially the user does not say anything; then ELIZA resets its money +counter and asks the user to tell what comes to mind open heartedly. +The subsequent answers are converted to uppercase characters and stored +for later comparison. ELIZA presents the bill when being confronted with +a sentence that contains the phrase "shut up." Otherwise, it looks for +keywords in the sentence, conjugates the rest of the sentence, remembers +the keyword for later use, and finally selects an answer from the set of +possible answers: + + function ElizaSays(YouSay) { + if (YouSay == "") { + cost = 0 + answer = "HI, IM ELIZA, TELL ME YOUR PROBLEM" + } else { + q = toupper(YouSay) + gsub("'", "", q) + if(q == qold) { + answer = "PLEASE DONT REPEAT YOURSELF !" + } else { + if (index(q, "SHUT UP") > 0) { + answer = "WELL, PLEASE PAY YOUR BILL. ITS EXACTLY ... $"\ + int(100*rand()+30+cost/100) + } else { + qold = q + w = "-" # no keyword recognized yet + for (i in k) { # search for keywords + if (index(q, i) > 0) { + w = i + break + } + } + if (w == "-") { # no keyword, take old subject + w = wold + subj = subjold + } else { # find subject + subj = substr(q, index(q, w) + length(w)+1) + wold = w + subjold = subj # remember keyword and subject + } + for (i in conj) + gsub(i, conj[i], q) # conjugation + # from all answers to this keyword, select one randomly + answer = r[indices[int(split(k[w], indices) * rand()) + 1]] + # insert subject into answer + gsub("_", subj, answer) + } + } + } + cost += length(answer) # for later payment : 1 cent per character + return answer + } + + In the long but simple function `SetUpEliza', you can see tables for +conjugation, keywords, and answers.(1) The associative array `k' +contains indices into the array of answers `r'. To choose an answer, +ELIZA just picks an index randomly: + + function SetUpEliza() { + srand() + wold = "-" + subjold = " " + + # table for conjugation + conj[" ARE " ] = " AM " + conj["WERE " ] = "WAS " + conj[" YOU " ] = " I " + conj["YOUR " ] = "MY " + conj[" IVE " ] =\ + conj[" I HAVE " ] = " YOU HAVE " + conj[" YOUVE " ] =\ + conj[" YOU HAVE "] = " I HAVE " + conj[" IM " ] =\ + conj[" I AM " ] = " YOU ARE " + conj[" YOURE " ] =\ + conj[" YOU ARE " ] = " I AM " + + # table of all answers + r[1] = "DONT YOU BELIEVE THAT I CAN _" + r[2] = "PERHAPS YOU WOULD LIKE TO BE ABLE TO _ ?" + ... + + # table for looking up answers that + # fit to a certain keyword + k["CAN YOU"] = "1 2 3" + k["CAN I"] = "4 5" + k["YOU ARE"] =\ + k["YOURE"] = "6 7 8 9" + ... + + } + + Some interesting remarks and details (including the original source +code of ELIZA) are found on Mark Humphrys' home page. Yahoo! also has +a page with a collection of ELIZA-like programs. Many of them are +written in Java, some of them disclosing the Java source code, and a +few even explain how to modify the Java source code. + + ---------- Footnotes ---------- + + (1) The version shown here is abbreviated. The full version comes +with the `gawk' distribution. + + +File: gawkinet.info, Node: Caveats, Next: Challenges, Prev: Simple Server, Up: Using Networking + +2.11 Network Programming Caveats +================================ + +By now it should be clear that debugging a networked application is more +complicated than debugging a single-process single-hosted application. +The behavior of a networked application sometimes looks noncausal +because it is not reproducible in a strong sense. Whether a network +application works or not sometimes depends on the following: + + * How crowded the underlying network is + + * If the party at the other end is running or not + + * The state of the party at the other end + + The most difficult problems for a beginner arise from the hidden +states of the underlying network. After closing a TCP connection, it's +often necessary to wait a short while before reopening the connection. +Even more difficult is the establishment of a connection that +previously ended with a "broken pipe." Those connections have to "time +out" for a minute or so before they can reopen. Check this with the +command `netstat -a', which provides a list of still "active" +connections. + + +File: gawkinet.info, Node: Challenges, Prev: Caveats, Up: Using Networking + +2.12 Where To Go From Here +========================== + +Now, you have learned enough to build your own application. You could, +for example, take part in the Loebner Contest to win $100,000. The +Loebner Prize is the first formal instantiation of a Turing Test. Hugh +Loebner agreed with The Cambridge Center for Behavioral Studies to +underwrite a contest designed to implement the Turing Test. Dr. +Loebner pledged a Grand Prize of $100,000 for the first computer whose +responses were indistinguishable from a human's. Each year an annual +prize of $2000 and a bronze medal is awarded to the _most_ human +computer. The winner of the annual contest is the best entry relative +to other entries that year, irrespective of how good it is in an +absolute sense. Here is an example of a conversation with the winning +program of 1997: + + (c)1997 Cambridge Center For Behavioral Studies all rights reserved [CONVERSE] + [Intelligent Research Ltd.] Start at: [1997/04/29 08:59:40] + PROGRAM[10:39:20]Hi, my name is Catherine, what's yours? + JUDGE02[10:39:36]My name is Nathan. + PROGRAM[10:39:37]Well, it's nice to meet you, Nathan. + PROGRAM[10:39:45]Did you see that story on CNN last night about the lesbian + PROGRAM[10:40:03]couple who came out at a White House party on Sunday? + JUDGE02[10:40:45]No. I'm sorry I missed that. How did the Prez take it? + PROGRAM[10:40:47]Ellen Degeneres was one of them - she was kissing her lover + PROGRAM[10:41:08]in front of Bill Clinton. Do you think he was very + PROGRAM[10:41:21]embarrassed? + JUDGE02[10:41:37]You tell me. I didn't see it. + PROGRAM[10:41:38]Are you still a Friend of Bill Clinton? + JUDGE02[10:42:22]Yes. I would like to talk about something else. Do you know anything about sculpture. + PROGRAM[10:42:24]Well, I suppose he's doing his best like most of us, but he + PROGRAM[10:42:35]does seem to attract an UNUSUAL amount of bad publicity, + PROGRAM[10:42:51]wouldnt you say--can it all be bad luck? + JUDGE02[10:43:20]I asked you if you knew anything about sculpture. + PROGRAM[10:43:21]OK, but what happens now--does it all drift on, with + PROGRAM[10:43:36]everyone waiting to see how much his friends have said about + PROGRAM[10:43:47]him under threat of lots and lots of jail-or will something + PROGRAM[10:44:01]dramatic actually happen? + JUDGE02[10:44:19]Do you know what sculpture is? + PROGRAM[10:44:20]Well, youre more a fan of his than I am I guess---though I + PROGRAM[10:44:34]cant SWEAR what Id have done in that locked hotel room, I + PROGRAM[10:44:51]must say. + JUDGE02[10:45:26]Have you ever been to the Salmagundi Club? + PROGRAM[10:45:29]Never. to go back to the original discussion Time to get + PROGRAM[10:45:43]off this one I think and onto something more down to earth! + JUDGE02[10:46:20]What is more down to earth than art? + PROGRAM[10:46:21]I don't know what -- to be honest. - + PROGRAM[10:46:30]I suppose it depends on where you're coming from, but as the + PROGRAM[10:46:46]song goes in My Fair Lady, "Why can't a woman be more like a + PROGRAM[10:47:03]man? " + JUDGE02[10:47:52]I liked that song but I prefer "Why can't children be more like we are" + JUDGE02[10:48:10]from another play the name of which I cannot remember. + PROGRAM[10:48:12]What, if anything, do you enjoy about being a man? + JUDGE02[10:49:01]I would enjoy all aspects of life no matter what my gender. + PROGRAM[10:49:02]Do you think of yourself as being attractive? + + This program insists on always speaking about the same story around +Bill Clinton. You see, even a program with a rather narrow mind can +behave so much like a human being that it can win this prize. It is +quite common to let these programs talk to each other via network +connections. But during the competition itself, the program and its +computer have to be present at the place the competition is held. We +all would love to see a `gawk' program win in such an event. Maybe it +is up to you to accomplish this? + + Some other ideas for useful networked applications: + * Read the file `doc/awkforai.txt' in the `gawk' distribution. It + was written by Ronald P. Loui (Associate Professor of Computer + Science, at Washington University in St. Louis, +" i " | " \ + "" config[i] " |