选题: 20191204 4 ways to control the flow of your awk script

sources/tech/20191204 4 ways to control the flow of your awk script.md
This commit is contained in:
DarkSun 2019-12-05 00:59:08 +08:00
parent cf8c2782d7
commit 6499d001b1

View File

@ -0,0 +1,273 @@
[#]: collector: (lujun9972)
[#]: translator: ( )
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (4 ways to control the flow of your awk script)
[#]: via: (https://opensource.com/article/19/12/control-awk-script)
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
4 ways to control the flow of your awk script
======
Learn to use switch statements and the break, continue, and next
commands to control awk scripts.
![JavaScript in Vim][1]
There are many ways to control the flow of an awk script, including [loops][2], **switch** statements and the **break**, **continue**, and **next** commands.
### Sample data
Create a sample data set called **colours.txt** and copy this content into it:
```
name       color  amount
apple      red    4
banana     yellow 6
strawberry red    3
raspberry  red    99
grape      purple 10
apple      green  8
plum       purple 2
kiwi       brown  4
potato     brown  9
pineapple  yellow 5
```
### Switch statements
The **switch** statement is a feature specific to GNU awk, so you can only use it with **gawk**. If your system or your target system doesn't have **gawk**, then you should not use a switch statement.
The **switch** statement in **gawk** is similar to the one in C and many other languages. The syntax is:
```
switch (expression) {
        case VALUE:
                <do something here>
        [...]
        default:
                <do something here>
}
```
The **expression** part can be any awk expression that returns a numeric or string result. The **VALUE** part (after the word **case**) is a numeric or string constant or a regular expression.
When a **switch** statement runs, the _expression_ is evaluated, and the result is matched against each case value. If there's a match, then the code contained within a case definition is executed. If there's no match in any case definition, then the default statement is executed.
The keyword **break** is at the end of the code in each case definition to break the loop. Without **break**, awk would continue to search for matching case values.
Here's an example **switch** statement:
```
#!/usr/bin/awk -f
#
# Example of the use of 'switch' in GNU Awk.
NR > 1 {
    printf "The %s is classified as: ",$1
    switch ($1) {
        case "apple":
            print "a fruit, pome"
            break
        case "banana":
        case "grape":
        case "kiwi":
            print "a fruit, berry"
            break
                case "raspberry":
                        print "a computer, pi"
                        break
        case "plum":
            print "a fruit, drupe"
            break
        case "pineapple":
            print "a fruit, fused berries (syncarp)"
            break
        case "potato":
            print "a vegetable, tuber"
            break
        default:
            print "[unclassified]"
    }
}
```
This script notably ignores the first line of the file, which in the case of the sample data is just a header. It does this by operating only on records with an index number greater than 1. On all other records, this script compares the contents of the first field (**$1**, as you know from previous articles) to the value of each **case** definition. If there's a match, the **print** function is used to print the botanical classification of the entry. If there are no matches, then the **default** instance prints **"[unclassified]"**.
The banana, grape, and kiwi are all botanically classified as a berry, so there are three **case** definitions associated with one **print** result.
Run the script on the **colours.txt** sample file, and you should get this:
```
The apple is classified as: a fruit, pome
The banana is classified as: a fruit, berry
The strawberry is classified as: [unclassified]
The raspberry is classified as: a computer, pi
The grape is classified as: a fruit, berry
The apple is classified as: a fruit, pome
The plum is classified as: a fruit, drupe
The kiwi is classified as: a fruit, berry
The potato is classified as: a vegetable, tuber
The pineapple is classified as: a fruit, fused berries (syncarp)
```
### Break
The **break** statement is mainly used for the early termination of a **for**, **while**, or **do-while** loop or a **switch** statement. In a loop, **break** is often used where it's not possible to determine the number of iterations of the loop beforehand. Invoking **break** terminates the enclosing loop (which is relevant when there are nested loops or loops within loops).
This example, straight out of the [GNU awk manual][3], shows a method of finding the smallest divisor. Read the additional comments for a clear understanding of how the code works:
```
#!/usr/bin/awk -f
{
    num = $1
    # Make an infinite FOR loop
    for (divisor = 2; ; divisor++) {
        # If num is divisible by divisor, then break
        if (num % divisor == 0) {
            printf "Smallest divisor of %d is %d\n", num, divisor
            break
        }
        # If divisor has gotten too large, the number has no
        # divisor, so is a prime
        if (divisor * divisor > num) {
            printf "%d is prime\n", num
            break
        }
    }
}
```
Try running the script to see its results:
```
    $ echo 67 | ./divisor.awk
    67 is prime
    $ echo 69 | ./divisor.awk
    Smallest divisor of 69 is 3
```
As you can see, even though the script starts out with an explicit _infinite_ loop with no end condition, the **break** function ensures that the script eventually terminates.
### Continue
The **continue** function is similar to **break**. It can be used in a **for**, **while**, or **do-while** loop (it's not relevant to a **switch** statements, though). Invoking **continue** skips the rest of the enclosing loop and begins the next cycle.
Here's another good example from the GNU awk manual to demonstrate a possible use of **continue**:
```
#!/usr/bin/awk -f
# Loop, printing numbers 0-20, except 5
BEGIN {
    for (x = 0; x <= 20; x++) {
        if (x == 5)
            continue
        printf "%d ", x
    }
    print ""
}
```
This script analyzes the value of **x** before printing anything. If the value is exactly 5, then **continue** is invoked, causing the **printf** line to be skipped, but leaves the loop unbroken. Try the same code but with **break** instead to see the difference.
### Next
This statement is not related to loops like **break** and **continue** are. Instead, **next** applies to the main record processing cycle of awk: the functions you place between the BEGIN and END functions. The **next** statement causes awk to stop processing the _current input record_ and to move to the next one.
As you know from the earlier articles in this series, awk reads records from its input stream and applies rules to them. The **next** statement stops the execution of rules for the current record and moves to the next one.
Here's an example of **next** being used to "hold" information upon a specific condition:
```
#!/usr/bin/awk -f
# Ignore the header
NR == 1 { next }
# If field 2 (colour) is less than 6
# characters, then save it with its
#  line number and skip it
length($2) < 6 {
    skip[NR] = $0
    next
}
# It's not the header and
# the colour name is > 6 characters,
# so print the line
{
    print
}
# At the end, show what was skipped
END {
    printf "\nSkipped:\n"
    for (n in skip)
        print n": "skip[n]
}
```
This sample uses **next** in the first rule to avoid the first line of the file, which is a header row. The second rule skips lines when the color name is less than six characters long, but it also saves that line in an array called **skip**, using the line number as the key (also known as the _index_).
The third rule prints anything it sees, but it is not invoked if either rule 1 or rule 2 causes it to be skipped.
Finally, at the end of all the processing, the **END** rule prints the contents of the array.
Run the sample script on the **colours.txt** file from above (and previous articles):
```
$ ./next.awk colours.txt
banana     yellow 6
grape      purple 10
plum       purple 2
pineapple  yellow 5
Skipped:
2: apple      red    4
4: strawberry red    3
6: apple      green  8
8: kiwi       brown  4
9: potato     brown  9
```
### Control freak
In summary, **switch**, **continue**, **next**, and **break** are important preemptive exceptions to awk rules that provide greater control of your script. You don't have to use them directly; often, you can gain the same logic through other means, but they're great convenience functions that make the coder's life a lot easier. The next article in this series covers the **printf** statement.
* * *
Would you rather listen to this article? It was adapted from an episode of [Hacker Public Radio][4], a community technology podcast by hackers, for hackers.
--------------------------------------------------------------------------------
via: https://opensource.com/article/19/12/control-awk-script
作者:[Seth Kenlon][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/seth
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/javascript_vim.jpg?itok=mqkAeakO (JavaScript in Vim)
[2]: https://opensource.com/article/19/11/loops-awk
[3]: https://www.gnu.org/software/gawk/manual/
[4]: http://hackerpublicradio.org/eps.php?id=2438