Obtaining Awk (and Perl)

*Return to htmlchek documentation

This document gives a list of the versions of awk that are available for various platforms (and also a few general Perl links). I have not personally verified all FTP links; fortunately, several alternative locations are provided in most cases (you can also try an archie search or FTP site search).


General Awk Links

Availability of Awk

GNU Awk or gawk is the premier freely redistributable Awk variant. The current released version (as of late June 2000) is 3.0.6. Networking features are to be added to a future release of Gawk (3.1.0?).

Newsflash: Gawk 3.1.0 sources are released; see ftp://ftp.gnu.org/gnu/gawk/.

Gawk source distribution

From: arnold@mathcs.emory.edu (Arnold D. Robbins)
Message-Id: <4e06mq$2ta@cssun.mathcs.emory.edu>
Date: 22 Jan 1996 09:24:26 -0500
Subject: Gawk 3.0 announcement

This is to announce the long awaited, often delayed release of gawk version 3.0. Gawk (GNU awk) provides a superset of POSIX awk, a pattern scanning and data manipulation language.

The relevant part of the NEWS file is appended below. This version is available as gawk-3.0.0.tar.gz in /pub/gnu on ftp.gnu.ai.mit.edu (or one of the mirror sites in the appended list). Updates from 2.15.6 are not available, as they would be too large. It's easier to just get the whole distribution, too much has changed. Of note is the heavily revised manual, which is almost twice the size of the previous edition. Also of note is that this release finally uses Autoconf for configuration.

Also available in the same directory is gawk-3.0.0-doc.tar.gz, which unpacks on top of the gawk distribution. This contains the TeX ``dribble'' and dvi files for the manual, as well as PostScript versions of the manual and the man pages.

Bug reports should be mailed to bug-gnu-utils@prep.ai.mit.edu, with copies to arnold@gnu.ai.mit.edu. See the included documentation for other addresses to use if you are addressing problems specific to a non-Unix version of gawk.

I would like to thank the following people for their help in producing this patch. All of them worked very hard, and I could not have done this without them.

David Trueman	- Initial versions of important parts of the mainline code
Pat Rankin	- VMS
Michal Jaegermann - Atari and NeXT and Ultrix
Darrell Hankerson - OS/2 and DOS (and Linux)
Scott Deifik	- DOS
Kai Uwe Rommel	- OS/2
Mark Moraes	- for testing with tools unavailable to me
Kaveh Ghazi	- for compiling on umpteen different Unix systems
Nelson Beebe	- ditto
Jefferey Friedl	- for help in tracking down regexp problems
Miriam Robbins	- for her patience, and for sharing me with the computer

Changes from 2.15.6 to 3.0

Fixed spelling of `Programming' in the copyright notice in all the files.

New --re-interval option to turn on interval expressions. They're off by default, except for --posix, to avoid breaking old programs.

Passing regexp constants as parameters to user defined functions now generates a lint warning.

Several obscure regexp bugs fixed; alas, a small number remain.

The manual has been thoroughly revised. It's now almost 50% bigger than it used to be.

The `+' modifier in printf is now reset correctly for each item.

The do_unix variable is now named do_traditional.

Handling of \ in sub and gsub rationalized (somewhat, see the manual for the gory [and I do mean gory] details).

IGNORECASE now uses ISO 8859-1 Latin-1 instead of straight ASCII. See the source for how to revert to pure ASCII.

--lint will now warn if an assignment occurs in a conditional context. This may become obnoxious enough to need turning off in the future, but "it seemed like a good idea at the time."

%hf and %Lf are now diagnosed as invalid in printf, just like %lf.

Gawk no longer incorrectly closes stdin in child processes used in input pipelines.

For integer formats, gawk now correctly treats the precision as the number of digits to print, not the number of characters.

gawk is now much better at catching the use of scalar values when arrays are needed, both in function calls and the `x in y' constructs.

New gensub function added. See the manual.

If do_traditional is true, octal and hex escapes in regexp constants are treated literally. This matches historical behavior.

yylex/nextc fixed so that even null characters can be included in the source code.

do_format now handles cases where a format specifier doesn't end in a control letter. --lint reports an error.

strftime() now uses a default time format equivalent to that of the Unix date command, thus it can be called with no arguments.

Gawk now catches functions that are used but not defined at parse time instead of at run time. (This is a lint error, making it fatal could break old code.)

Arrays that max out are now handled correctly.

Integer formats outside the range of an unsigned long are now detected correctly using the SunOS 4.x cc compiler.

--traditional option added as new preferred name for --compat, in keeping with GCC.

--lint-old option added, so that warnings about things not in old awk are only given if explicitly asked for.

`next file' has changed to one word, `nextfile'. `next file' is still accepted but generates a lint warning. `next file' will go away eventually.

Gawk with --lint will now notice empty source files and empty data files.

Amiga support using the Unix emulation added. Thanks to fnf@amigalib.com.

test/Makefile is now "parallel-make safe".

Gawk now uses POSIX regexps + GNU regex ops by default. --posix goes to pure posix regexps, and --compat goes to traditional Unix regexps. However, interval expressions, even though specified by POSIX, are turned off by default, to avoid breaking old code.

IGNORECASE now applies to *everything*, string comparison as well as regexp operations.

The AT&T Bell Labs Research awk fflush builtin function is now supported. fflush is extended to flush stdout if no arg and everything if given the null string as an argument.

If RS is more than one character, it is treated as a regular expression and records are delimited accordingly. The variable RT is set to the record terminator string. This is disabled in compatibility mode.

If FS is set to the null string (or the third arg. of split() is the null string), splitting is done at every single character. This is disabled in compatibility mode.

Gawk now uses the Autoconf generated configure script, doing away with all the config/* files and the machinery that went with them. The Makefile.in has also changed accordingly, complete with all the standard GNU Makefile targets. (Non-unix systems may still have their own config.h and Makefile; see the appropriate README_d/README.* and/or subdirectory.)

The source code has been cleaned up somewhat and the formatting improved.

Most GNU software is packed using the new `gzip' compression program.

For information on how to order GNU software on tape, floppy, or cd-rom, check the file etc/ORDERS in the GNU Emacs distribution or in GNUinfo/ORDERS on prep.ai.mit.edu [ftp.gnu.ai.mit.edu], or e-mail a request to: gnu@prep.ai.mit.edu

[The following links point towards the main gawk 3.0.6 archive file, which contains gawk source code files assembled in a Unix tar archive and then compressed with GNU gzip; the documentation files in .ps and .dvi formats will be available in supplemental archive files named gawk-3.0.6-doc.tar.gz and gawk-3.0.6-ps.tar.gz in the same directory. For compiled executables (binaries) for non-Unix systems, see the next section below.]

The central GNU archive for source code distributions is at ftp://ftp.gnu.org/gnu/gawk/; the Gnu archive and mirror sites currently contain source code and upgrade patches for older versions of Gawk.

Precompiled Gawk binaries, other Awk versions, etc.

There are three main types of ports of Gawk to MS-DOS available as precompiled executables: First, 16-bit ports (GNUish, etc.) that will run under plain unenhanced MS-DOS on any 80x86 processor, but are limited to using 640k of memory. Second, 32-bit ports (DJGPP, etc.) that can run processing jobs requiring more than 640k of memory, but require DPMI memory-management services to run. (DPMI services are automatically available in a MS-Windows DOS box, but require running a separate program in plain MS-DOS.) The DJGPP compiled 32-bit binaries can also see long filenames under Windows 95/98. Third, there are true Windows ports that depend on WIN32 DLL's. Finally, the fourth section below contains links to miscellaneous non-MS-DOS ports and non-Gawk sources.

Also, an AWK-to-C-source-code translator is available at http://awka.sourceforge.net/.

Availability of Perl

I am only including some general Perl links here. Perl 4.036 ports were available for MS-DOS, Mac, Windows-NT, VMS, Amiga, OS/2, Atari, LynxOS, MVS, MPE, Xenix, and Netware; I'm not sure what the status of Perl 5 ports is.