[Translated] How to use awk command in Linux

This commit is contained in:
zhengsihua 2014-07-31 20:58:15 +08:00
parent 8152973849
commit 98d1cb1930
2 changed files with 121 additions and 133 deletions

View File

@ -1,133 +0,0 @@
Translating------geekpi
How to use awk command in Linux
================================================================================
Text processing is at the heart of Unix. From pipes to the /proc subsystem, the "everything is a file" philosophy pervades the operating system and all of the tools built for it. Because of this, getting comfortable with text-processing is one of the most important skills for an aspiring Linux system administrator, or even any power user, and awk is one of the most powerful text-processing tools available outside general-purpose programming languages.
The simplest awk task is selecting fields from stdin; if you never learn any more about awk than this, you'll still have at your disposal an extremely useful tool.
By default, awk separates input lines by whitespace. If you'd like to select the first field from input, you just need to tell awk to print out $1:
$ echo 'one two three four' | awk '{print $1}'
> one
(Yes, the curly-brace syntax is a little weird, but I promise that's about as weird as it gets in this lesson.)
Can you guess how you'd select the second, third, or fourth fields? That's right, with $2, $3, and $4, respectively.
$ echo 'one two three four' | awk '{print $3}'
(Yes, the curly-brace syntax is a little weird, but I promise that's about as weird as it gets in this lesson.)
Can you guess how you'd select the second, third, or fourth fields? That's right, with $2, $3, and $4, respectively.
$ echo 'one two three four' | awk '{print $3}'
> three
Often when text munging, you need to create a specific format of data, and that covers more than just a single word. The good news is that awk makes it easy to print multiple fields, or even include static strings:
$ echo 'one two three four' | awk '{print $3,$1}'
> three one
----------
$ echo 'one two three four' | awk '{print "foo:",$3,"| bar:",$1}'
> foo: three | bar: one
Ok, but what if your input isn't separated by whitespace? Just pass awk the '-F' flag with your separator:
$ echo 'one mississippi,two mississippi,three mississippi,four mississippi' | awk -F , '{print $4}'
> four mississippi
Occasionally, you may find yourself working with data with a varied number of fields, and you just know you want the *last* one. awk prepopulates the $NF variable with the *number of fields*, so you can use it to grab the last element:
$ echo 'one two three four' | awk '{print $NF}'
> four
You can also do simple math on $NF, in case you need the next-to-last field:
$ echo 'one two three four' | awk '{print $(NF-1)}'
> three
Or even the middle field:
$ echo 'one two three four' | awk '{print $((NF/2)+1)}'
> three
$ echo 'one two three four five' | awk '{print $((NF/2)+1)}'
> three
While this is all very useful, you can get away with forcing sed, cut, and grep into a form to get these results, as well (albeit with a lot more work).
So, I'll leave you with one last introductory feature of awk, maintaining state across lines.
$ echo -e 'one 1\ntwo 2' | awk '{print $2}'
> 1
>
> 2
$ echo -e 'one 1\ntwo 2' | awk '{sum+=$2} END {print sum}'
> 3
(The END indicates that we should only perform the following block **after** we finish processing every line.)
The case where I've used this is to sum up bytes from web server request logs. Imagine we have an access log that looks like this:
$ cat requests.log
> Jul 23 18:57:12 httpd[31950]: "GET /foo/bar HTTP/1.1" 200 344
>
> Jul 23 18:57:13 httpd[31950]: "GET / HTTP/1.1" 200 9300
>
> Jul 23 19:01:27 httpd[31950]: "GET / HTTP/1.1" 200 9300
>
> Jul 23 19:01:55 httpd[31950]: "GET /foo/baz HTTP/1.1" 200 6401
>
> Jul 23 19:02:31 httpd[31950]: "GET /foo/baz?page=2 HTTP/1.1" 200 6312
We know the last field is the number of bytes of the response. We've already learned how to extract them using print and $NF:
$ < requests.log awk '{print $NF}'
> 344
>
> 9300
>
> 9300
>
> 6401
>
> 6312
And so we can sum into a variable to gather the total number of bytes our webserver has served to clients during the timespan of our log:
$ < requests.log awk '{totalBytes+=$NF} END {print totalBytes}'
> 31657
If you're looking for more to do with awk, you can find used copies of [the original awk book][1] for under 15 USD on Amazon. You may also enjoy Eric Pement's [collection of awk one-liners][2].
--------------------------------------------------------------------------------
via: http://xmodulo.com/2014/07/use-awk-command-linux.html
作者:[James Pearson][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[a]:http://xmodulo.com/author/james
[1]:http://www.amazon.com/gp/product/020107981X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=020107981X&linkCode=as2&tag=xmodulo-20&linkId=6NW62B2WBRBXRFJB
[2]:http://www.pement.org/awk/awk1line.txt

View File

@ -0,0 +1,121 @@
如何在Linux中使用awk命令
================================================================================
文本处理是Unix的核心。从管道到/proc子系统“一切都是文件”的理念贯穿于操作系统和所有基于它构造的工具。正因为如此轻松地处理文本是一个期望成为Linux系统管理员甚至是资深用户的最重要的技能之一awk是通用编程语言之外最强大的文本处理工具之一。
最简单的awk的任务是从标准输入中选择字段;如果你对awk除了这个没有学习过其他的它还是会是你身边一个非常有用的工具。
默认情况下awk通过空格分隔输入。如果您想选择输入的第一个字段你只需要告诉awk输出$ 1
$ echo 'one two three four' | awk '{print $1}'
> one
(是的,大括号语法是有点古怪,但我保证这是我们这节课一直会遇到。)
你能猜出如何选择第二,第三或第四个字段么?是的,分别用$2$ 3$ 4。
$ echo 'one two three four' | awk '{print $3}'
> three
通常在文本改写时你需要创建一个特定的数据格式并且它覆盖不止一个单词。好消息是awk中可以很容易地打印多个字段甚至包含静态字符串
$ echo 'one two three four' | awk '{print $3,$1}'
> three one
----------
$ echo 'one two three four' | awk '{print "foo:",$3,"| bar:",$1}'
> foo: three | bar: one
好吧如果你的输入不是由空格分隔怎么办只需用awk中的'-F'标志后带上你的分隔符:
$ echo 'one mississippi,two mississippi,three mississippi,four mississippi' | awk -F , '{print $4}'
> four mississippi
偶尔间,你会发现自己正在处理拥有不同的字段数量的数据,但你只知道你想要的*最后*字段。 awk中内置的$NF变量代表*字段的数量*,这样你就可以用它来抓取最后一个元素:
$ echo 'one two three four' | awk '{print $NF}'
> four
你也可以用$NF做简单的数学假如你需要倒数第二个字段
$ echo 'one two three four' | awk '{print $(NF-1)}'
> three
甚至是中间的字段:
$ echo 'one two three four' | awk '{print $((NF/2)+1)}'
> three
而且这一切都非常有用同样你可以摆脱强制使用sedcut和grep来得到这些结果尽管有大量的工作
因此我将为你留下awk的最后介绍特性维护跨行状态。
$ echo -e 'one 1\ntwo 2' | awk '{print $2}'
> 1
>
> 2
$ echo -e 'one 1\ntwo 2' | awk '{sum+=$2} END {print sum}'
> 3
END代表的是我们在执行完每行的处理**之后**只处理下面的代码块
这里我使用的例子是统计web服务器请求日志的字节大小。想象一下我们有如下这样的日志
$ cat requests.log
> Jul 23 18:57:12 httpd[31950]: "GET /foo/bar HTTP/1.1" 200 344
>
> Jul 23 18:57:13 httpd[31950]: "GET / HTTP/1.1" 200 9300
>
> Jul 23 19:01:27 httpd[31950]: "GET / HTTP/1.1" 200 9300
>
> Jul 23 19:01:55 httpd[31950]: "GET /foo/baz HTTP/1.1" 200 6401
>
> Jul 23 19:02:31 httpd[31950]: "GET /foo/baz?page=2 HTTP/1.1" 200 6312
我们知道最后一个字段是响应的字节大小。我们已经学习了如何使用$NF来抽取他们
$ < requests.log awk '{print $NF}'
> 344
>
> 9300
>
> 9300
>
> 6401
>
> 6312
接着我们可以将它们累加到一个变量中来收集我们的web服务其在日志中这段时间内的响应客户端的字节数量
$ < requests.log awk '{totalBytes+=$NF} END {print totalBytes}'
> 31657
如果你正在寻找关于awk的更多资料你可以在Amazon中在15美元内找到[原始awk手册][1]的副本。你同样可以使用Eric Pement的[单行awk命令收集][2]这本书
--------------------------------------------------------------------------------
via: http://xmodulo.com/2014/07/use-awk-command-linux.html
作者:[James Pearson][a]
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[a]:http://xmodulo.com/author/james
[1]:http://www.amazon.com/gp/product/020107981X/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=020107981X&linkCode=as2&tag=xmodulo-20&linkId=6NW62B2WBRBXRFJB
[2]:http://www.pement.org/awk/awk1line.txt