mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-29 21:41:00 +08:00
6499d001b1
sources/tech/20191204 4 ways to control the flow of your awk script.md
274 lines
9.8 KiB
Markdown
274 lines
9.8 KiB
Markdown
[#]: collector: (lujun9972)
|
||
[#]: translator: ( )
|
||
[#]: reviewer: ( )
|
||
[#]: publisher: ( )
|
||
[#]: url: ( )
|
||
[#]: subject: (4 ways to control the flow of your awk script)
|
||
[#]: via: (https://opensource.com/article/19/12/control-awk-script)
|
||
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
|
||
|
||
4 ways to control the flow of your awk script
|
||
======
|
||
Learn to use switch statements and the break, continue, and next
|
||
commands to control awk scripts.
|
||
![JavaScript in Vim][1]
|
||
|
||
There are many ways to control the flow of an awk script, including [loops][2], **switch** statements and the **break**, **continue**, and **next** commands.
|
||
|
||
### Sample data
|
||
|
||
Create a sample data set called **colours.txt** and copy this content into it:
|
||
|
||
|
||
```
|
||
name color amount
|
||
apple red 4
|
||
banana yellow 6
|
||
strawberry red 3
|
||
raspberry red 99
|
||
grape purple 10
|
||
apple green 8
|
||
plum purple 2
|
||
kiwi brown 4
|
||
potato brown 9
|
||
pineapple yellow 5
|
||
```
|
||
|
||
### Switch statements
|
||
|
||
The **switch** statement is a feature specific to GNU awk, so you can only use it with **gawk**. If your system or your target system doesn't have **gawk**, then you should not use a switch statement.
|
||
|
||
The **switch** statement in **gawk** is similar to the one in C and many other languages. The syntax is:
|
||
|
||
|
||
```
|
||
switch (expression) {
|
||
case VALUE:
|
||
<do something here>
|
||
[...]
|
||
default:
|
||
<do something here>
|
||
}
|
||
```
|
||
|
||
The **expression** part can be any awk expression that returns a numeric or string result. The **VALUE** part (after the word **case**) is a numeric or string constant or a regular expression.
|
||
|
||
When a **switch** statement runs, the _expression_ is evaluated, and the result is matched against each case value. If there's a match, then the code contained within a case definition is executed. If there's no match in any case definition, then the default statement is executed.
|
||
|
||
The keyword **break** is at the end of the code in each case definition to break the loop. Without **break**, awk would continue to search for matching case values.
|
||
|
||
Here's an example **switch** statement:
|
||
|
||
|
||
```
|
||
#!/usr/bin/awk -f
|
||
#
|
||
# Example of the use of 'switch' in GNU Awk.
|
||
|
||
NR > 1 {
|
||
printf "The %s is classified as: ",$1
|
||
|
||
switch ($1) {
|
||
case "apple":
|
||
print "a fruit, pome"
|
||
break
|
||
case "banana":
|
||
case "grape":
|
||
case "kiwi":
|
||
print "a fruit, berry"
|
||
break
|
||
case "raspberry":
|
||
print "a computer, pi"
|
||
break
|
||
case "plum":
|
||
print "a fruit, drupe"
|
||
break
|
||
case "pineapple":
|
||
print "a fruit, fused berries (syncarp)"
|
||
break
|
||
case "potato":
|
||
print "a vegetable, tuber"
|
||
break
|
||
default:
|
||
print "[unclassified]"
|
||
}
|
||
}
|
||
```
|
||
|
||
This script notably ignores the first line of the file, which in the case of the sample data is just a header. It does this by operating only on records with an index number greater than 1. On all other records, this script compares the contents of the first field (**$1**, as you know from previous articles) to the value of each **case** definition. If there's a match, the **print** function is used to print the botanical classification of the entry. If there are no matches, then the **default** instance prints **"[unclassified]"**.
|
||
|
||
The banana, grape, and kiwi are all botanically classified as a berry, so there are three **case** definitions associated with one **print** result.
|
||
|
||
Run the script on the **colours.txt** sample file, and you should get this:
|
||
|
||
|
||
```
|
||
The apple is classified as: a fruit, pome
|
||
The banana is classified as: a fruit, berry
|
||
The strawberry is classified as: [unclassified]
|
||
The raspberry is classified as: a computer, pi
|
||
The grape is classified as: a fruit, berry
|
||
The apple is classified as: a fruit, pome
|
||
The plum is classified as: a fruit, drupe
|
||
The kiwi is classified as: a fruit, berry
|
||
The potato is classified as: a vegetable, tuber
|
||
The pineapple is classified as: a fruit, fused berries (syncarp)
|
||
```
|
||
|
||
### Break
|
||
|
||
The **break** statement is mainly used for the early termination of a **for**, **while**, or **do-while** loop or a **switch** statement. In a loop, **break** is often used where it's not possible to determine the number of iterations of the loop beforehand. Invoking **break** terminates the enclosing loop (which is relevant when there are nested loops or loops within loops).
|
||
|
||
This example, straight out of the [GNU awk manual][3], shows a method of finding the smallest divisor. Read the additional comments for a clear understanding of how the code works:
|
||
|
||
|
||
```
|
||
#!/usr/bin/awk -f
|
||
|
||
{
|
||
num = $1
|
||
|
||
# Make an infinite FOR loop
|
||
for (divisor = 2; ; divisor++) {
|
||
|
||
# If num is divisible by divisor, then break
|
||
if (num % divisor == 0) {
|
||
printf "Smallest divisor of %d is %d\n", num, divisor
|
||
break
|
||
}
|
||
|
||
# If divisor has gotten too large, the number has no
|
||
# divisor, so is a prime
|
||
if (divisor * divisor > num) {
|
||
printf "%d is prime\n", num
|
||
break
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
Try running the script to see its results:
|
||
|
||
|
||
```
|
||
$ echo 67 | ./divisor.awk
|
||
67 is prime
|
||
$ echo 69 | ./divisor.awk
|
||
Smallest divisor of 69 is 3
|
||
```
|
||
|
||
As you can see, even though the script starts out with an explicit _infinite_ loop with no end condition, the **break** function ensures that the script eventually terminates.
|
||
|
||
### Continue
|
||
|
||
The **continue** function is similar to **break**. It can be used in a **for**, **while**, or **do-while** loop (it's not relevant to a **switch** statements, though). Invoking **continue** skips the rest of the enclosing loop and begins the next cycle.
|
||
|
||
Here's another good example from the GNU awk manual to demonstrate a possible use of **continue**:
|
||
|
||
|
||
```
|
||
#!/usr/bin/awk -f
|
||
|
||
# Loop, printing numbers 0-20, except 5
|
||
|
||
BEGIN {
|
||
for (x = 0; x <= 20; x++) {
|
||
if (x == 5)
|
||
continue
|
||
printf "%d ", x
|
||
}
|
||
print ""
|
||
}
|
||
```
|
||
|
||
This script analyzes the value of **x** before printing anything. If the value is exactly 5, then **continue** is invoked, causing the **printf** line to be skipped, but leaves the loop unbroken. Try the same code but with **break** instead to see the difference.
|
||
|
||
### Next
|
||
|
||
This statement is not related to loops like **break** and **continue** are. Instead, **next** applies to the main record processing cycle of awk: the functions you place between the BEGIN and END functions. The **next** statement causes awk to stop processing the _current input record_ and to move to the next one.
|
||
|
||
As you know from the earlier articles in this series, awk reads records from its input stream and applies rules to them. The **next** statement stops the execution of rules for the current record and moves to the next one.
|
||
|
||
Here's an example of **next** being used to "hold" information upon a specific condition:
|
||
|
||
|
||
```
|
||
#!/usr/bin/awk -f
|
||
|
||
# Ignore the header
|
||
NR == 1 { next }
|
||
|
||
# If field 2 (colour) is less than 6
|
||
# characters, then save it with its
|
||
# line number and skip it
|
||
|
||
length($2) < 6 {
|
||
skip[NR] = $0
|
||
next
|
||
}
|
||
|
||
# It's not the header and
|
||
# the colour name is > 6 characters,
|
||
# so print the line
|
||
{
|
||
print
|
||
}
|
||
|
||
# At the end, show what was skipped
|
||
END {
|
||
printf "\nSkipped:\n"
|
||
for (n in skip)
|
||
print n": "skip[n]
|
||
}
|
||
```
|
||
|
||
This sample uses **next** in the first rule to avoid the first line of the file, which is a header row. The second rule skips lines when the color name is less than six characters long, but it also saves that line in an array called **skip**, using the line number as the key (also known as the _index_).
|
||
|
||
The third rule prints anything it sees, but it is not invoked if either rule 1 or rule 2 causes it to be skipped.
|
||
|
||
Finally, at the end of all the processing, the **END** rule prints the contents of the array.
|
||
|
||
Run the sample script on the **colours.txt** file from above (and previous articles):
|
||
|
||
|
||
```
|
||
$ ./next.awk colours.txt
|
||
banana yellow 6
|
||
grape purple 10
|
||
plum purple 2
|
||
pineapple yellow 5
|
||
|
||
Skipped:
|
||
2: apple red 4
|
||
4: strawberry red 3
|
||
6: apple green 8
|
||
8: kiwi brown 4
|
||
9: potato brown 9
|
||
```
|
||
|
||
### Control freak
|
||
|
||
In summary, **switch**, **continue**, **next**, and **break** are important preemptive exceptions to awk rules that provide greater control of your script. You don't have to use them directly; often, you can gain the same logic through other means, but they're great convenience functions that make the coder's life a lot easier. The next article in this series covers the **printf** statement.
|
||
|
||
* * *
|
||
|
||
Would you rather listen to this article? It was adapted from an episode of [Hacker Public Radio][4], a community technology podcast by hackers, for hackers.
|
||
|
||
--------------------------------------------------------------------------------
|
||
|
||
via: https://opensource.com/article/19/12/control-awk-script
|
||
|
||
作者:[Seth Kenlon][a]
|
||
选题:[lujun9972][b]
|
||
译者:[译者ID](https://github.com/译者ID)
|
||
校对:[校对者ID](https://github.com/校对者ID)
|
||
|
||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||
|
||
[a]: https://opensource.com/users/seth
|
||
[b]: https://github.com/lujun9972
|
||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/javascript_vim.jpg?itok=mqkAeakO (JavaScript in Vim)
|
||
[2]: https://opensource.com/article/19/11/loops-awk
|
||
[3]: https://www.gnu.org/software/gawk/manual/
|
||
[4]: http://hackerpublicradio.org/eps.php?id=2438
|