Update 20191104 Fields, records, and variables in awk.md

This commit is contained in:
wenwensnow 2019-11-20 15:40:08 +01:00 committed by GitHub
parent 7801f24387
commit 8d4a1c7db1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -7,22 +7,20 @@
[#]: via: (https://opensource.com/article/19/11/fields-records-variables-awk)
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
Fields, records, and variables in awk
Fields, records, and variables in awk awk中字段记录和变量
======
In the second article in this intro to awk series, learn about fields,
records, and some powerful awk variables.
在我们这个系列的第二部分我们会学习到字段记录和一些非常有用的awk变量。
![Man at laptop on a mountain][1]
Awk comes in several varieties: There is the original **awk**, written in 1977 at AT&T Bell Laboratories, and several reimplementations, such as **mawk**, **nawk**, and the one that ships with most Linux distributions, GNU awk, or **gawk**. On most Linux distributions, awk and gawk are synonyms referring to GNU awk, and typing either invokes the same awk command. See the [GNU awk user's guide][2] for the full history of awk and gawk.
The [first article][3] in this series showed that awk is invoked on the command line with this syntax:
Awk 有好几个变种: 最早版本的 **awk**, 是1977 年 AT&T Bell 实验室所创造的。还有一些重构版本,例如**mawk**, **nawk**。现在我们能在大多数Linux 发行版中见到的,是 GNU awk也叫**gawk**。 在大多数 Linux 版本中awk 和 gawk 都是指向 GNU awk 的链接。 如果输入awk命令也是一样的效果。 在 [GNU awk 用户手册][2]中能看到 awk 和 gawk 的全部历史。
这一系列的[第一篇文章][3] 介绍了awk 命令的基本格式:
```
`$ awk [options] 'pattern {action}' inputfile`
```
Awk is the command, and it can take options (such as **-F** to define the field separator). The action you want awk to perform is contained in single quotes, at least when it's issued in a terminal. To further emphasize which part of the awk command is the action you want it to take, you can precede your program with the **-e** option (but it's not required):
Awk 是一个命令,后面要接选项 (比如用 **-F** 来定义字段分隔符)。 你想让awk 执行的部分需要写在 两个单引号之间,至少在终端中需要这么做。 在awk 命令中,为了进一步强调你想要执行的部分,可以用 **-e** 选项来突出显示 (但这不是必须的):
```
@ -33,43 +31,42 @@ green
[...]
```
### Records and fields
### Records and fields 记录和字段
Awk views its input data as a series of _records_, which are usually newline-delimited lines. In other words, awk generally sees each line in a text file as a new record. Each record contains a series of _fields_. A field is a component of a record delimited by a _field separator_.
Awk 将输入数据视为 一系列 _记录_ 通常来说是按行分割的。 换句话说awk 通常将文本中的每一行视作一个记录。每一记录包含多个 _字段_. 一个字段是由 _字段分隔符_ 分隔出的,记录的一部分.
By default, awk sees whitespace, such as spaces, tabs, and newlines, as indicators of a new field. Specifically, awk treats multiple _space_ separators as one, so this line contains two fields:
默认情况下awk 将各种空白符如空格tab,换行符,视为分隔符。 值得注意的是awk 将多个 _空格_ 视为一个分隔符。所以下面这行文本有两个字段:
```
`raspberry red`
```
As does this one:
这行也是:
```
`tuxedo                  black`
```
Other separators are not treated this way. Assuming that the field separator is a comma, the following example record contains three fields, with one probably being zero characters long (assuming a non-printable character isn't hiding in that field):
其他分隔符在程序中不是这么处理的。假设字段分隔符是逗号如下所示的记录就分为三个字段。其中一个字段可能会只有0个字节长假设这一字段中不包含隐藏字符
```
`a,,b`
```
### The awk program
The _program_ part of an awk command consists of a series of rules. Normally, each rule begins on a new line in the program (although this is not mandatory). Each rule consists of a pattern and one or more actions:
### awk 程序
awk 命令的 _程序部分_ 是由一系列规则组成的。通常来说,在程序中每个规则占一行(尽管这不是必须的)。 每个规则由一个模式,或者一个/多个动作组成:
```
`pattern { action }`
```
In a rule, you can define a pattern as a condition to control whether the action will run on a record. Patterns can be simple comparisons, regular expressions, combinations of the two, and more.
在一个规则中,你可以通过定义模式,来确定行动是否会在记录中执行。 模式可以是简单的比较条件,正则表达式,两者的结合或者更多。
For instance, this will print a record _only_ if it contains the word "raspberry":
这个例子中,程序 _只会_ 显示包含 单词 “raspberry” 的记录:
```
@ -77,15 +74,15 @@ $ awk '/raspberry/ { print $0 }' colours.txt
raspberry red 99
```
If there is no qualifying pattern, the action is applied to every record.
如果没有文本符合模式,最终结果会对应所有记录。
Also, a rule can consist of only a pattern, in which case the entire record is written as if the action was **{ print }**.
并且,在一条规则只包含一个模式时,相当于在整个记录上执行 **{ print }** 命令。
Awk programs are essentially _data-driven_ in that actions depend on the data, so they are quite a bit different from programs in many other programming languages.
Awk 程序本质上是 _数据驱动_ 的,命令执行结果取决于数据。所以,与其他编程语言中的程序相比,它还是有些区别的。
### The NF variable
### NF 变量
Each field has a variable as a designation, but there are special variables for fields and records, too. The variable **NF** stores the number of fields awk finds in the current record. This can be printed or used in tests. Here is an example using the [text file][3] from the previous article:
每个字段都有指定变量,但针对字段和记录,也有一些特殊的变量。 **NF** 变量能存储awk在当前记录中找到的数字字段。可在屏幕上显示出变量内容或将其用于测试。 下面例子中的数据,来自前一篇文章中的 [文本][3]
```
@ -96,12 +93,11 @@ banana     yellow 6 (3)
[...]
```
Awk's **print** function takes a series of arguments (which may be variables or strings) and concatenates them together. This is why, at the end of each line in this example, awk prints the number of fields as an integer enclosed by parentheses.
Awk**print** 函数会接受一系列参数可以是变量或者字符并将它们拼接起来。这就是为什么在这一例子中在每行结尾处awk 会显示一个被括号括起来的整数。
### The NR variable
In addition to counting the fields in each record, awk also counts input records. The record number is held in the variable **NR**, and it can be used in the same way as any other variable. For example, to print the record number before each line:
### NR 变量
另外为了计算每个记录中的字段数awk 也计算输入记录。 记录数目被存储在变量 **NR** 中,它的使用方法和其他变量没有任何区别。例如,为了在每一行开头显示行号:
```
$ awk '{ print NR ": " $0 }' colours.txt
@ -113,24 +109,23 @@ $ awk '{ print NR ": " $0 }' colours.txt
[...]
```
Note that it's acceptable to write this command with no spaces other than the one after **print**, although it's more difficult for a human to parse:
注意,在这个命令后输入数据时,可以不同于在 **print** 后,参数间可以不写空格,尽管这样会降低可读性:
```
`$ awk '{print NR": "$0}' colours.txt`
```
### The printf() function
### printf() 函数
For greater flexibility in how the output is formatted, you can use the awk **printf()** function. This is similar to **printf** in C, Lua, Bash, and other languages. It takes a _format_ argument followed by a comma-separated list of items. The argument list may be enclosed in parentheses.
为了输出结果时格式更灵活,你可以使用 awk 的 **printf()** 函数。 它与CLua,Bash和其他语言中的 **printf** 相类似。 它也接受 _格式_ ,后用逗号分隔的参数。参数列表需要写在括号内。
```
`$ printf format, item1, item2, ...`
```
The format argument (or _format string_) defines how each of the other arguments will be output. It uses _format specifiers_ to do this, including **%s** to output a string and **%d** to output a decimal number. The following **printf** statement outputs the record followed by the number of fields in parentheses:
格式这一参数(也叫 _格式符_ 定义了其他参数会如何显示。 这一功能是用 _格式修饰符_ 来实现的。 用 **%s** 显示字符, **%d** 显示数字。 下面的**printf** 语句,会在括号内显示字段数量:
```
$ awk 'printf "%s (%d)\n",$0,NF}' colours.txt
@ -140,13 +135,14 @@ banana     yellow 6 (3)
[...]
```
In this example, **%s (%d)** provides the structure for each line, while **$0,NF** defines the data to be inserted into the **%s** and **%d** positions. Note that, unlike with the **print** function, no newline is generated without explicit instructions. The escape sequence **\n** does this.
### Awk scripting
在这个例子里, **%s (%d)** 提供了每一行的输出格式,**$0,NF** 定义了插入 **%s** 和 **%d** 位置的数据。注意,不像**print** 函数,在没有明确指令时下,输出不会转到下一行。 转义字符 **\n** 才会换行。
All of the awk code in this article has been written and executed in an interactive Bash prompt. For more complex programs, it's often easier to place your commands into a file or _script_. The option **-f FILE** (not to be confused with **-F**, which denotes the field separator) may be used to invoke a file containing a program.
### Awk 脚本编程
For example, here is a simple awk script. Create a file called **example1.awk** with this content:
这篇文章中出现的所有awk代码都在Bash终端中执行过。 在更复杂的程序中,将你的命令放在文件( _脚本_ )中,这样会更容易。 **-f FILE** 选项(不要和 **-F** 弄混了,那个选项用于字段分隔符),可用于调用包含可执行程序的文件。
例如这里有一个简单的awk 脚本。 创建一个名为 **example1.awk** 的文件,包含以下内容:
```
@ -155,6 +151,7 @@ For example, here is a simple awk script. Create a file called **example1.awk**
```
It's conventional to give such files the extension **.awk** to make it clear that they hold an awk program. This naming is not mandatory, but it gives file managers and editors (and you) a useful clue about what the file is.
如果一个文件包含 awk 程序,最好给这些文件 **.awk** 的扩展名。
Run the script:
@ -166,8 +163,7 @@ B: banana     yellow 6
A: apple      green  8
```
A file containing awk instructions can be made into a script by adding a **#!** line at the top and making it executable. Create a file called **example2.awk** with these contents:
一个包含 awk 命令的文件,在最开头一行加上 **#!** ,就可以变成可执行脚本。 创建一个名为 **example2.awk** 的文件,包含以下内容:
```
#!/usr/bin/awk -f