TSL

2025-03-30 02:40:11 +08:00 · 2020-09-28 00:03:12 +08:00 · 2020-09-28 00:03:12 +08:00 · 7a326c87e7
commit 7a326c87e7
parent ddb160cf0a
2 changed files with 272 additions and 287 deletions
--- a/sources/tech/20191031
+++ b/sources/tech/20191031
@ -1,287 +0,0 @@
-[#]: collector: (lujun9972)
-[#]: translator: (wxy)
-[#]: reviewer: ( )
-[#]: publisher: ( )
-[#]: url: ( )
-[#]: subject: (Advance your awk skills with two easy tutorials)
-[#]: via: (https://opensource.com/article/19/10/advanced-awk)
-[#]: author: (Dave Neary https://opensource.com/users/dneary)
-
-Advance your awk skills with two easy tutorials
-======
-Go beyond one-line awk scripts with mail merge and word counting.
-![a checklist for a team][1]
-
-Awk is one of the oldest tools in the Unix and Linux user's toolbox. Created in the 1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan (the A, W, and K of the tool's name), awk was created for complex processing of text streams. It is a companion tool to sed, the stream editor, which is designed for line-by-line processing of text files. Awk allows more complex structured programs and is a complete programming language.
-
-This article will explain how to use awk for more structured and complex tasks, including a simple mail merge application.
-
-### Awk program structure
-
-An awk script is made up of functional blocks surrounded by **{}** (curly brackets). There are two special function blocks, **BEGIN** and **END**, that execute before processing the first line of the input stream and after the last line is processed. In between, blocks have the format:
-
-
-```
-`pattern { action statements }`
-```
-
-Each block executes when the line in the input buffer matches the pattern. If no pattern is included, the function block executes on every line of the input stream.
-
-Also, the following syntax can be used to define functions in awk that can be called from any block:
-
-
-```
-`function name(parameter list) { statements }`
-```
-
-This combination of pattern-matching blocks and functions allows the developer to structure awk programs for reuse and readability.
-
-### How awk processes text streams
-
-Awk reads text from its input file or stream one line at a time and uses a field separator to parse it into a number of fields. In awk terminology, the current buffer is a _record_. There are a number of special variables that affect how awk reads and processes a file:
-
-  * **FS** (field separator): By default, this is any whitespace (spaces or tabs)
-  * **RS** (record separator): By default, a newline (**\n**)
-  * **NF** (number of fields): When awk parses a line, this variable is set to the number of fields that have been parsed
-  * **$0:** The current record
-  * **$1, $2, $3, etc.:** The first, second, third, etc. field from the current record
-  * **NR** (number of records): The number of records that have been parsed so far by the awk script
-
-
-
-There are many other variables that affect awk's behavior, but this is enough to start with.
-
-### Awk one-liners
-
-For a tool so powerful, it's interesting that most of awk's usage is basic one-liners. Perhaps the most common awk program prints selected fields from an input line from a CSV file, a log file, etc. For example, the following one-liner prints a list of usernames from **/etc/passwd**:
-
-
-```
-`awk -F":" '{print $1 }' /etc/passwd`
-```
-
-As mentioned above, **$1** is the first field in the current record. The **-F** option sets the FS variable to the character **:**.
-
-The field separator can also be set in a BEGIN function block:
-
-
-```
-`awk 'BEGIN { FS=":" } {print $1 }' /etc/passwd`
-```
-
-In the following example, every user whose shell is not **/sbin/nologin** can be printed by preceding the block with a pattern match:
-
-
-```
-`awk 'BEGIN { FS=":" } ! /\/sbin\/nologin/ {print $1 }' /etc/passwd`
-```
-
-### Advanced awk: Mail merge
-
-Now that you have some of the basics, try delving deeper into awk with a more structured example: creating a mail merge.
-
-A mail merge uses two files, one (called in this example **email_template.txt**) containing a template for an email you want to send:
-
-
-```
-From: Program committee &lt;[pc@event.org][2]&gt;
-To: {firstname} {lastname} &lt;{email}&gt;
-Subject: Your presentation proposal
-
-Dear {firstname},
-
-Thank you for your presentation proposal:
-  {title}
-
-We are pleased to inform you that your proposal has been successful! We
-will contact you shortly with further information about the event
-schedule.
-
-Thank you,
-The Program Committee
-```
-
-And the other is a CSV file (called **proposals.csv**) with the people you want to send the email to:
-
-
-```
-firstname,lastname,email,title
-Harry,Potter,[hpotter@hogwarts.edu][3],"Defeating your nemesis in 3 easy steps"
-Jack,Reacher,[reacher@covert.mil][4],"Hand-to-hand combat for beginners"
-Mickey,Mouse,[mmouse@disney.com][5],"Surviving public speaking with a squeaky voice"
-Santa,Claus,[sclaus@northpole.org][6],"Efficient list-making"
-```
-
-You want to read the CSV file, replace the relevant fields in the first file (skipping the first line), then write the result to a file called **acceptanceN.txt**, incrementing **N** for each line you parse.
-
-Write the awk program in a file called **mail_merge.awk**. Statements are separated by **;** in awk scripts. The first task is to set the field separator variable and a couple of other variables the script needs. You also need to read and discard the first line in the CSV, or a file will be created starting with _Dear firstname_. To do this, use the special function **getline** and reset the record counter to 0 after reading it.
-
-
-```
-BEGIN {
-  FS=",";
-  template="email_template.txt";
-  output="acceptance";
-  getline;
-  NR=0;
-}
-```
-
-The main function is very straightforward: for each line processed, a variable is set for the various fields—**firstname**, **lastname**, **email**, and **title**. The template file is read line by line, and the function **sub** is used to substitute any occurrence of the special character sequences with the value of the relevant variable. Then the line, with any substitutions made, is output to the output file.
-
-Since you are dealing with the template file and a different output file for each line, you need to clean up and close the file handles for these files before processing the next record.
-
-
-```
-{
-        # Read relevant fields from input file
-        firstname=$1;
-        lastname=$2;
-        email=$3;
-        title=$4;
-
-        # Set output filename
-        outfile=(output NR ".txt");
-
-        # Read a line from template, replace special fields, and
-        # print result to output file
-        while ( (getline ln &lt; template) &gt; 0 )
-        {
-                sub(/{firstname}/,firstname,ln);
-                sub(/{lastname}/,lastname,ln);
-                sub(/{email}/,email,ln);
-                sub(/{title}/,title,ln);
-                print(ln) &gt; outfile;
-        }
-
-        # Close template and output file in advance of next record
-        close(outfile);
-        close(template);
-}
-```
-
-You're done! Run the script on the command line with:
-
-
-```
-`awk -f mail_merge.awk proposals.csv`
-```
-
-or
-
-
-```
-`awk -f mail_merge.awk < proposals.csv`
-```
-
-and you will find text files generated in the current directory.
-
-### Advanced awk: Word frequency count
-
-One of the most powerful features in awk is the associative array. In most programming languages, array entries are typically indexed by a number, but in awk, arrays are referenced by a key string. You could store an entry from the file _proposals.txt_ from the previous section. For example, in a single associative array, like this:
-
-
-```
-        proposer["firstname"]=$1;
-        proposer["lastname"]=$2;
-        proposer["email"]=$3;
-        proposer["title"]=$4;
-```
-
-This makes text processing very easy. A simple program that uses this concept is the idea of a word frequency counter. You can parse a file, break out words (ignoring punctuation) in each line, increment the counter for each word in the line, then output the top 20 words that occur in the text.
-
-First, in a file called **wordcount.awk**, set the field separator to a regular expression that includes whitespace and punctuation:
-
-
-```
-BEGIN {
-        # ignore 1 or more consecutive occurrences of the characters
-        # in the character group below
-        FS="[ .,:;()&lt;&gt;{}@!\"'\t]+";
-}
-```
-
-Next, the main loop function will iterate over each field, ignoring any empty fields (which happens if there is punctuation at the end of a line), and increment the word count for the words in the line.
-
-
-```
-{
-        for (i = 1; i &lt;= NF; i++) {
-                if ($i != "") {
-                        words[$i]++;
-                }
-        }
-}
-```
-
-Finally, after the text is processed, use the END function to print the contents of the array, then use awk's capability of piping output into a shell command to do a numerical sort and print the 20 most frequently occurring words:
-
-
-```
-END {
-        sort_head = "sort -k2 -nr | head -n 20";
-        for (word in words) {
-                printf "%s\t%d\n", word, words[word] | sort_head;
-        }
-        close (sort_head);
-}
-```
-
-Running this script on an earlier draft of this article produced this output:
-
-
-```
-[[dneary@dhcp-49-32.bos.redhat.com][7]]$ awk -f wordcount.awk &lt; awk_article.txt
-the     79
-awk     41
-a       39
-and     33
-of      32
-in      27
-to      26
-is      25
-line    23
-for     23
-will    22
-file    21
-we      16
-We      15
-with    12
-which   12
-by      12
-this    11
-output  11
-function        11
-```
-
-### What's next?
-
-If you want to learn more about awk programming, I strongly recommend the book [_Sed and awk_][8] by Dale Dougherty and Arnold Robbins.
-
-One of the keys to progressing in awk programming is mastering "extended regular expressions." Awk offers several powerful additions to the sed [regular expression][9] syntax you may already be familiar with.
-
-Another great resource for learning awk is the [GNU awk user guide][10]. It has a full reference for awk's built-in function library, as well as lots of examples of simple and complex awk scripts.
-
--------------------------------------------------------------------------------
-
-via: https://opensource.com/article/19/10/advanced-awk
-
-作者：[Dave Neary][a]
-选题：[lujun9972][b]
-译者：[译者ID](https://github.com/译者ID)
-校对：[校对者ID](https://github.com/校对者ID)
-
-本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
-
-[a]: https://opensource.com/users/dneary
-[b]: https://github.com/lujun9972
-[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/checklist_hands_team_collaboration.png?itok=u82QepPk (a checklist for a team)
-[2]: mailto:pc@event.org
-[3]: mailto:hpotter@hogwarts.edu
-[4]: mailto:reacher@covert.mil
-[5]: mailto:mmouse@disney.com
-[6]: mailto:sclaus@northpole.org
-[7]: mailto:dneary@dhcp-49-32.bos.redhat.com
-[8]: https://www.amazon.com/sed-awk-Dale-Dougherty/dp/1565922255/book
-[9]: https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
-[10]: https://www.gnu.org/software/gawk/manual/gawk.html
--- a/translated/tech/20191031
+++ b/translated/tech/20191031
@ -0,0 +1,272 @@
+[#]: collector: (lujun9972)
+[#]: translator: (wxy)
+[#]: reviewer: ( )
+[#]: publisher: ( )
+[#]: url: ( )
+[#]: subject: (Advance your awk skills with two easy tutorials)
+[#]: via: (https://opensource.com/article/19/10/advanced-awk)
+[#]: author: (Dave Neary https://opensource.com/users/dneary)
+
+
+通过两个简单的教程来提高你的 awk 技能
+======
+
+> 超越单行的awk脚本，邮件合并和字数统计。
+!["一个团队的检查表"[1]
+
+`awk` 是 Unix 和 Linux 用户工具箱中最古老的工具之一。`awk` 由 Alfred Aho、Peter Weinberger 和 Brian Kernighan（工具名称中的 A、W 和 K）在 20 世纪 70 年代创建，用于复杂的文本流处理。它是流编辑器 `sed` 的配套工具，后者是为逐行处理文本文件而设计的。`awk` 支持更复杂的结构化程序，是一种完整的编程语言。
+
+本文将介绍如何使用 `awk` 完成更多结构化的复杂任务，包括一个简单的邮件合并程序。
+
+### awk 的程序结构
+
+`awk` 脚本是由 `{}`（大括号）包围的功能块组成，其中有两个特殊的功能块，`BEGIN` 和 `END`，它们在处理第一行输入流之前和最后一行处理之后执行。在这两者之间，块的格式为：
+
+```
+模式 { 动作语句 }
+```
+
+当输入缓冲区中的行与模式匹配时，每个块都会执行。如果没有包含模式，则函数块在输入流的每一行都会执行。
+
+另外，以下语法可以用于在 `awk` 中定义可以从任何块中调用的函数。
+
+```
+function 函数名(参数列表) { 语句 }
+```
+
+这种模式匹配块和函数的组合允许开发者结构化 `awk` 程序，以便重用和可读。
+
+### awk 如何处理文本流
+
+`awk` 每次从输入文件或流中一行一行地读取文本，并使用字段分隔符将其解析成若干字段。在 `awk` 的术语中，当前的缓冲区是一个*记录*。有一些特殊的变量会影响 `awk` 读取和处理文件的方式：
+
+  * `FS`（<ruby>字段分隔符<rt>field separator</rt></ruby>）。默认情况下，这是任何空格字符（空格或制表符）。
+  * `RS`（<ruby>记录分隔符<rt>record separator</rt></ruby>）。默认情况下是一个新行（`n`）。
+  * `NF`（<ruby>字段数<rt>number of fields</rt></ruby>）。当 `awk` 解析一行时，这个变量被设置为已解析的字段数。
+  * `$0:` 当前记录。
+  * `$1`、`$2`、`$3` 等：当前记录的第一、第二、第三等字段。
+  * `NR`（<ruby>记录数<rt>number of records</rt></ruby>）。迄今已被 `awk` 脚本解析的记录数。
+
+影响 `awk` 行为的变量还有很多，但这已经足够开始了。
+
+### 单行 awk 脚本
+
+对于一个如此强大的工具来说，有趣的是，`awk` 的大部分用法都是基本的单行脚本。也许最常见的 `awk` 程序是打印 CSV 文件、日志文件等输入行中的选定字段。例如，下面的单行脚本从 `/etc/passwd` 中打印出一个用户名列表：
+
+```
+awk -F":" '{print $1 }' /etc/passwd
+```
+
+如上所述，`$1` 是当前记录中的第一个字段。`-F` 选项将 `FS` 变量设置为字符 `:`。
+
+字段分隔符也可以在 `BEGIN` 函数块中设置：
+
+```
+awk 'BEGIN { FS=":" } {print $1 }' /etc/passwd
+```
+
+在下面的例子中，每一个 shell 不是 `/sbin/nologin` 的用户都可以通过在该块前面加上匹配模式来打印出来：
+
+```
+awk 'BEGIN { FS=":" } ! /\/sbin\/nologin/ {print $1 }' /etc/passwd
+```
+
+### awk 进阶：邮件合并
+
+现在你已经掌握了一些基础知识，尝试用一个更具有结构化的例子来深入了解 `awk`：创建邮件合并。
+
+邮件合并使用两个文件，其中一个文件（在本例中称为 `email_template.txt`）包含了你要发送的电子邮件的模板：
+
+```
+From: Program committee <pc@event.org>
+To: {firstname} {lastname} <{email}>
+Subject: Your presentation proposal
+
+Dear {firstname},
+
+Thank you for your presentation proposal:
+  {title}
+
+We are pleased to inform you that your proposal has been successful! We
+will contact you shortly with further information about the event
+schedule.
+
+Thank you,
+The Program Committee
+```
+
+而另一个则是一个 CSV 文件（名为 `proposals.csv`），里面有你要发送邮件的人：
+
+```
+firstname,lastname,email,title
+Harry,Potter,hpotter@hogwarts.edu,"Defeating your nemesis in 3 easy steps"
+Jack,Reacher,reacher@covert.mil,"Hand-to-hand combat for beginners"
+Mickey,Mouse,mmouse@disney.com,"Surviving public speaking with a squeaky voice"
+Santa,Claus,sclaus@northpole.org,"Efficient list-making"
+```
+
+你要读取 CSV 文件，替换第一个文件中的相关字段（跳过第一行），然后把结果写到一个叫 `acceptanceN.txt` 的文件中，每解析一行就递增 `N`。
+
+把 `awk` 程序写在一个叫 `mail_merge.awk` 的文件中。在 `awk` 脚本中的语句用 `;` 分隔。第一个任务是设置字段分隔符变量和其他几个脚本需要的变量。你还需要读取并丢弃 CSV 中的第一行，否则会创建一个以 `Dear firstname` 开头的文件。要做到这一点，请使用特殊函数 `getline`，并在读取后将记录计数器重置为 0。
+
+```
+BEGIN {
+  FS=",";
+  template="email_template.txt";
+  output="acceptance";
+  getline;
+  NR=0;
+}
+```
+
+主要功能非常简单：每处理一行，就为各种字段设置一个变量 —— `firstname`、`lastname`、`email` 和 `title`。模板文件被逐行读取，并使用函数 `sub` 将任何出现的特殊字符序列替换为相关变量的值。然后将该行以及所做的任何替换输出到输出文件中。
+
+由于每行都要处理模板文件和不同的输出文件，所以在处理下一条记录之前，需要清理和关闭这些文件的文件句柄。
+
+```
+{
+        # Read relevant fields from input file
+        firstname=$1;
+        lastname=$2;
+        email=$3;
+        title=$4;
+
+        # Set output filename
+        outfile=(output NR ".txt");
+
+        # Read a line from template, replace special fields, and
+        # print result to output file
+        while ( (getline ln &lt; template) &gt; 0 )
+        {
+                sub(/{firstname}/,firstname,ln);
+                sub(/{lastname}/,lastname,ln);
+                sub(/{email}/,email,ln);
+                sub(/{title}/,title,ln);
+                print(ln) &gt; outfile;
+        }
+
+        # Close template and output file in advance of next record
+        close(outfile);
+        close(template);
+}
+```
+
+你已经完成了! 在命令行上运行该脚本：
+
+
+```
+awk -f mail_merge.awk proposals.csv
+```
+
+或
+
+```
+awk -f mail_merge.awk < proposals.csv
+```
+
+你会发现在当前目录下生成的文本文件。
+
+### awk 进阶：字频计数
+
+`awk` 中最强大的功能之一是关联数组，在大多数编程语言中，数组条目通常由数字索引，但在 `awk` 中，数组由一个键字符串进行引用。你可以从上一节的文件 `proposals.txt` 中存储一个条目。例如，在一个单一的关联数组中，像这样：
+
+```
+        proposer["firstname"]=$1;
+        proposer["lastname"]=$2;
+        proposer["email"]=$3;
+        proposer["title"]=$4;
+```
+
+这使得文本处理变得非常容易。一个使用了这个概念的简单的程序就是词频计数器。你可以解析一个文件，在每一行中分解出单词（忽略标点符号），对行中的每个单词进行递增计数器，然后输出文本中出现的前 20 个单词。
+
+首先，在一个名为 `wordcount.awk` 的文件中，将字段分隔符设置为包含空格和标点符号的正则表达式：
+
+```
+BEGIN {
+        # ignore 1 or more consecutive occurrences of the characters
+        # in the character group below
+        FS="[ .,:;()<>{}@!\"'\t]+";
+}
+```
+
+接下来，主循环函数将遍历每个字段，忽略任何空字段（如果行末有标点符号，则会出现这种情况），并递增行中单词数：
+
+```
+{
+        for (i = 1; i &lt;= NF; i++) {
+                if ($i != "") {
+                        words[$i]++;
+                }
+        }
+}
+```
+
+最后，处理完文本后，使用 `END` 函数打印数组的内容，然后利用 `awk` 的能力，将输出的内容用管道输入 shell 命令，进行数字排序，并打印出 20 个最常出现的单词。
+
+```
+END {
+        sort_head = "sort -k2 -nr | head -n 20";
+        for (word in words) {
+                printf "%s\t%d\n", word, words[word] | sort_head;
+        }
+        close (sort_head);
+}
+```
+
+在这篇文章的早期草稿上运行这个脚本，会产生这样的输出：
+
+```
+[dneary@dhcp-49-32.bos.redhat.com]$ awk -f wordcount.awk < awk_article.txt
+the     79
+awk     41
+a       39
+and     33
+of      32
+in      27
+to      26
+is      25
+line    23
+for     23
+will    22
+file    21
+we      16
+We      15
+with    12
+which   12
+by      12
+this    11
+output  11
+function        11
+```
+
+### 下一步是什么？
+
+如果你想了解更多关于 `awk` 编程的知识，我强烈推荐 Dale Dougherty 和 Arnold Robbins 所著的《[Sed 和 awk][8]》这本书。
+
+`awk` 编程进阶的关键之一是掌握“扩展正则表达式”。`awk` 为你可能已经熟悉的 sed [正则表达式][9]语法提供了几个强大的补充。
+
+另一个学习 `awk` 的好资源是 [GNU awk 用户指南][10]。它有一个完整的 `awk` 内置函数库的参考资料，以及很多简单和复杂的 `awk` 脚本的例子。
+
+--------------------------------------------------------------------------------
+
+via: https://opensource.com/article/19/10/advanced-awk
+
+作者：[Dave Neary][a]
+选题：[lujun9972][b]
+译者：[wxy](https://github.com/wxy)
+校对：[校对者ID](https://github.com/校对者ID)
+
+本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译，[Linux中国](https://linux.cn/) 荣誉推出
+
+[a]: https://opensource.com/users/dneary
+[b]: https://github.com/lujun9972
+[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/checklist_hands_team_collaboration.png?itok=u82QepPk (a checklist for a team)
+[2]: mailto:pc@event.org
+[3]: mailto:hpotter@hogwarts.edu
+[4]: mailto:reacher@covert.mil
+[5]: mailto:mmouse@disney.com
+[6]: mailto:sclaus@northpole.org
+[7]: mailto:dneary@dhcp-49-32.bos.redhat.com
+[8]: https://www.amazon.com/sed-awk-Dale-Dougherty/dp/1565922255/book
+[9]: https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions
+[10]: https://www.gnu.org/software/gawk/manual/gawk.html