TranslateProject/sources/tech/20191104 Fields, records, and variables in awk.md

9.4 KiB

Fields, records, and variables in awk

In the second article in this intro to awk series, learn about fields, records, and some powerful awk variables. Man at laptop on a mountain

Awk comes in several varieties: There is the original awk, written in 1977 at AT&T Bell Laboratories, and several reimplementations, such as mawk, nawk, and the one that ships with most Linux distributions, GNU awk, or gawk. On most Linux distributions, awk and gawk are synonyms referring to GNU awk, and typing either invokes the same awk command. See the GNU awk user's guide for the full history of awk and gawk.

The first article in this series showed that awk is invoked on the command line with this syntax:

`$ awk [options] 'pattern {action}' inputfile`

Awk is the command, and it can take options (such as -F to define the field separator). The action you want awk to perform is contained in single quotes, at least when it's issued in a terminal. To further emphasize which part of the awk command is the action you want it to take, you can precede your program with the -e option (but it's not required):

$ awk -F, -e '{print $2;}' colours.txt
yellow
blue
green
[...]

Records and fields

Awk views its input data as a series of records, which are usually newline-delimited lines. In other words, awk generally sees each line in a text file as a new record. Each record contains a series of fields. A field is a component of a record delimited by a field separator.

By default, awk sees whitespace, such as spaces, tabs, and newlines, as indicators of a new field. Specifically, awk treats multiple space separators as one, so this line contains two fields:

`raspberry red`

As does this one:

`tuxedo                  black`

Other separators are not treated this way. Assuming that the field separator is a comma, the following example record contains three fields, with one probably being zero characters long (assuming a non-printable character isn't hiding in that field):

`a,,b`

The awk program

The program part of an awk command consists of a series of rules. Normally, each rule begins on a new line in the program (although this is not mandatory). Each rule consists of a pattern and one or more actions:

`pattern { action }`

In a rule, you can define a pattern as a condition to control whether the action will run on a record. Patterns can be simple comparisons, regular expressions, combinations of the two, and more.

For instance, this will print a record only if it contains the word "raspberry":

$ awk '/raspberry/ { print $0 }' colours.txt
raspberry red 99

If there is no qualifying pattern, the action is applied to every record.

Also, a rule can consist of only a pattern, in which case the entire record is written as if the action was { print }.

Awk programs are essentially data-driven in that actions depend on the data, so they are quite a bit different from programs in many other programming languages.

The NF variable

Each field has a variable as a designation, but there are special variables for fields and records, too. The variable NF stores the number of fields awk finds in the current record. This can be printed or used in tests. Here is an example using the text file from the previous article:

$ awk '{ print $0 " (" NF ")" }' colours.txt
name       color  amount (3)
apple      red    4 (3)
banana     yellow 6 (3)
[...]

Awk's print function takes a series of arguments (which may be variables or strings) and concatenates them together. This is why, at the end of each line in this example, awk prints the number of fields as an integer enclosed by parentheses.

The NR variable

In addition to counting the fields in each record, awk also counts input records. The record number is held in the variable NR, and it can be used in the same way as any other variable. For example, to print the record number before each line:

$ awk '{ print NR ": " $0 }' colours.txt
1: name       color  amount
2: apple      red    4
3: banana     yellow 6
4: raspberry  red    3
5: grape      purple 10
[...]

Note that it's acceptable to write this command with no spaces other than the one after print, although it's more difficult for a human to parse:

`$ awk '{print NR": "$0}' colours.txt`

The printf() function

For greater flexibility in how the output is formatted, you can use the awk printf() function. This is similar to printf in C, Lua, Bash, and other languages. It takes a format argument followed by a comma-separated list of items. The argument list may be enclosed in parentheses.

`$ printf format, item1, item2, ...`

The format argument (or format string) defines how each of the other arguments will be output. It uses format specifiers to do this, including %s to output a string and %d to output a decimal number. The following printf statement outputs the record followed by the number of fields in parentheses:

$ awk 'printf "%s (%d)\n",$0,NF}' colours.txt
name       color  amount (3)
raspberry  red    4 (3)
banana     yellow 6 (3)
[...]

In this example, %s (%d) provides the structure for each line, while $0,NF defines the data to be inserted into the %s and %d positions. Note that, unlike with the print function, no newline is generated without explicit instructions. The escape sequence \n does this.

Awk scripting

All of the awk code in this article has been written and executed in an interactive Bash prompt. For more complex programs, it's often easier to place your commands into a file or script. The option -f FILE (not to be confused with -F, which denotes the field separator) may be used to invoke a file containing a program.

For example, here is a simple awk script. Create a file called example1.awk with this content:

/^a/ {print "A: " $0}
/^b/ {print "B: " $0}

It's conventional to give such files the extension .awk to make it clear that they hold an awk program. This naming is not mandatory, but it gives file managers and editors (and you) a useful clue about what the file is.

Run the script:

$ awk -f example1.awk colours.txt
A: raspberry  red    4
B: banana     yellow 6
A: apple      green  8

A file containing awk instructions can be made into a script by adding a #! line at the top and making it executable. Create a file called example2.awk with these contents:

#!/usr/bin/awk -f
#
# Print all but line 1 with the line number on the front
#

NR > 1 {
    printf "%d: %s\n",NR,$0
}

Arguably, there's no advantage to having just one line in a script, but sometimes it's easier to execute a script than to remember and type even a single line. A script file also provides a good opportunity to document what a command does. Lines starting with the # symbol are comments, which awk ignores.

Grant the file executable permission:

`$ chmod u+x example2.awk`

Run the script:

$ ./example2.awk colours.txt
2: apple      red    4
2: banana     yellow 6
4: raspberry red    3
5: grape      purple 10
[...]

An advantage of placing your awk instructions in a script file is that it's easier to format and edit. While you can write awk on a single line in your terminal, it can get overwhelming when it spans several lines.

Try it

You now know enough about how awk processes your instructions to be able to write a complex awk program. Try writing an awk script with more than one rule and at least one conditional pattern. If you want to try more functions than just print and printf, refer to the gawk manual online.

Here's an idea to get you started:

#!/usr/bin/awk -f
#
# Print each record EXCEPT
# IF the first record contains "raspberry",
# THEN replace "red" with "pi"

$1 == "raspberry" {
        gsub(/red/,"pi")
}

{ print }

Try this script to see what it does, and then try to write your own.

The next article in this series will introduce more functions for even more complex (and useful!) scripts.


This article is adapted from an episode of Hacker Public Radio, a community technology podcast.


via: https://opensource.com/article/19/11/fields-records-variables-awk

作者:Seth Kenlon 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出