[Translated] tech/20150522 Analyzing Linux Logs.md

This commit is contained in:
ictlyh 2015-08-01 10:05:00 +08:00
parent f76fa728ff
commit e90526660c
2 changed files with 181 additions and 182 deletions

View File

@ -1,182 +0,0 @@
ictlyh Translating
Analyzing Linux Logs
================================================================================
Theres a great deal of information waiting for you within your logs, although its not always as easy as youd like to extract it. In this section we will cover some examples of basic analysis you can do with your logs right away (just search whats there). Well also cover more advanced analysis that may take some upfront effort to set up properly, but will save you time on the back end. Examples of advanced analysis you can do on parsed data include generating summary counts, filtering on field values, and more.
Well show you first how to do this yourself on the command line using several different tools and then show you how a log management tool can automate much of the grunt work and make this so much more streamlined.
### Searching with Grep ###
Searching for text is the most basic way to find what youre looking for. The most common tool for searching text is [grep][1]. This command line tool, available on most Linux distributions, allows you to search your logs using regular expressions. A regular expression is a pattern written in a special language that can identify matching text. The simplest pattern is to put the string youre searching for surrounded by quotes
#### Regular Expressions ####
Heres an example to find authentication logs for “user hoover” on an Ubuntu system:
$ grep "user hoover" /var/log/auth.log
Accepted password for hoover from 10.0.2.2 port 4792 ssh2
pam_unix(sshd:session): session opened for user hoover by (uid=0)
pam_unix(sshd:session): session closed for user hoover
It can be hard to construct regular expressions that are accurate. For example, if we searched for a number like the port “4792” it could also match timestamps, URLs, and other undesired data. In the below example for Ubuntu, it matched an Apache log that we didnt want.
$ grep "4792" /var/log/auth.log
Accepted password for hoover from 10.0.2.2 port 4792 ssh2
74.91.21.46 - - [31/Mar/2015:19:44:32 +0000] "GET /scripts/samples/search?q=4972HTTP/1.0" 404 545 "-" "-”
#### Surround Search ####
Another useful tip is that you can do surround search with grep. This will show you what happened a few lines before or after a match. It can help you debug what lead up to a particular error or problem. The B flag gives you lines before, and A gives you lines after. For example, we can see that when someone failed to login as an admin, they also failed the reverse mapping which means they might not have a valid domain name. This is very suspicious!
$ grep -B 3 -A 2 'Invalid user' /var/log/auth.log
Apr 28 17:06:20 ip-172-31-11-241 sshd[12545]: reverse mapping checking getaddrinfo for 216-19-2-8.commspeed.net [216.19.2.8] failed - POSSIBLE BREAK-IN ATTEMPT!
Apr 28 17:06:20 ip-172-31-11-241 sshd[12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
Apr 28 17:06:20 ip-172-31-11-241 sshd[12547]: Invalid user admin from 216.19.2.8
Apr 28 17:06:20 ip-172-31-11-241 sshd[12547]: input_userauth_request: invalid user admin [preauth]
Apr 28 17:06:20 ip-172-31-11-241 sshd[12547]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
#### Tail ####
You can also pair grep with [tail][2] to get the last few lines of a file, or to follow the logs and print them in real time. This is useful if you are making interactive changes like starting a server or testing a code change.
$ tail -f /var/log/auth.log | grep 'Invalid user'
Apr 30 19:49:48 ip-172-31-11-241 sshd[6512]: Invalid user ubnt from 219.140.64.136
Apr 30 19:49:49 ip-172-31-11-241 sshd[6514]: Invalid user admin from 219.140.64.136
A full introduction on grep and regular expressions is outside the scope of this guide, but [Ryans Tutorials][3] include more in-depth information.
Log management systems have higher performance and more powerful searching abilities. They often index their data and parallelize queries so you can quickly search gigabytes or terabytes of logs in seconds. In contrast, this would take minutes or in extreme cases hours with grep. Log management systems also use query languages like [Lucene][4] which offer an easier syntax for searching on numbers, fields, and more.
### Parsing with Cut, AWK, and Grok ###
#### Command Line Tools ####
Linux offers several command line tools for text parsing and analysis. They are great if you want to quickly parse a small amount of data but can take a long time to process large volumes of data
#### Cut ####
The [cut][5] command allows you to parse fields from delimited logs. Delimiters are characters like equal signs or commas that break up fields or key value pairs.
Lets say we want to parse the user from this log:
pam_unix(su:auth): authentication failure; logname=hoover uid=1000 euid=0 tty=/dev/pts/0 ruser=hoover rhost= user=root
We can use the cut command like this to get the text after the eighth equal sign. This example is on an Ubuntu system:
$ grep "authentication failure" /var/log/auth.log | cut -d '=' -f 8
root
hoover
root
nagios
nagios
#### AWK ####
Alternately, you can use [awk][6], which offers more powerful features to parse out fields. It offers a scripting language so you can filter out nearly everything thats not relevant.
For example, lets say we have the following log line on an Ubuntu system and we want to extract the username that failed to login:
Mar 24 08:28:18 ip-172-31-11-241 sshd[32701]: input_userauth_request: invalid user guest [preauth]
Heres how you can use the awk command. First, put a regular expression /sshd.*invalid user/ to match the sshd invalid user lines. Then print the ninth field using the default delimiter of space using { print $9 }. This outputs the usernames.
$ awk '/sshd.*invalid user/ { print $9 }' /var/log/auth.log
guest
admin
info
test
ubnt
You can read more about how to use regular expressions and print fields in the [Awk Users Guide][7].
#### Log Management Systems ####
Log management systems make parsing easier and enable users to quickly analyze large collections of log files. They can automatically parse standard log formats like common Linux logs or web server logs. This saves a lot of time because you dont have to think about writing your own parsing logic when troubleshooting a system problem.
Here you can see an example log message from sshd which has each of the fields remoteHost and user parsed out. This is a screenshot from Loggly, a cloud-based log management service.
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.25.09-AM.png)
You can also do custom parsing for non-standard formats. A common tool to use is [Grok][8] which uses a library of common regular expressions to parse raw text into structured JSON. Here is an example configuration for Grok to parse kernel log files inside Logstash:
filter{
grok {
match => {"message" => "%{CISCOTIMESTAMP:timestamp} %{HOST:host} %{WORD:program}%{NOTSPACE} %{NOTSPACE}%{NUMBER:duration}%{NOTSPACE} %{GREEDYDATA:kernel_logs}"
}
}
And here is what the parsed output looks like from Grok:
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.30.37-AM.png)
### Filtering with Rsyslog and AWK ###
Filtering allows you to search on a specific field value instead of doing a full text search. This makes your log analysis more accurate because it will ignore undesired matches from other parts of the log message. In order to search on a field value, you need to parse your logs first or at least have a way of searching based on the event structure.
#### How to Filter on One App ####
Often, you just want to see the logs from just one application. This is easy if your application always logs to a single file. Its more complicated if you need to filter one application among many in an aggregated or centralized log. Here are several ways to do this:
1. Use the rsyslog daemon to parse and filter logs. This example writes logs from the sshd application to a file named sshd-messages, then discards the event so its not repeated elsewhere. You can try this example by adding it to your rsyslog.conf file.
:programname, isequal, “sshd” /var/log/sshd-messages
&~
2. Use command line tools like awk to extract the values of a particular field like the sshd username. This example is from an Ubuntu system.
$ awk '/sshd.*invalid user/ { print $9 }' /var/log/auth.log
guest
admin
info
test
ubnt
3. Use a log management system that automatically parses your logs, then click to filter on the desired application name. Here is a screenshot showing the syslog fields in a log management service called Loggly. We are filtering on the appName “sshd” as indicated by the Venn diagram icon.
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.05.02-AM.png)
#### How to Filter on Errors ####
One of the most common thing people want to see in their logs is errors. Unfortunately, the default syslog configuration doesnt output the severity of errors directly, making it difficult to filter on them.
There are two ways you can solve this problem. First, you can modify your rsyslog configuration to output the severity in the log file to make it easier to read and search. In your rsyslog configuration you can add a [template][9] with pri-text such as the following:
"<%pri-text%> : %timegenerated%,%HOSTNAME%,%syslogtag%,%msg%n"
This example gives you output in the following format. You can see that the severity in this message is err.
<authpriv.err> : Mar 11 18:18:00,hoover-VirtualBox,su[5026]:, pam_authenticate: Authentication failure
You can use awk or grep to search for just the error messages. In this example for Ubuntu, were including some surrounding syntax like the . and the > which match only this field.
$ grep '.err>' /var/log/auth.log
<authpriv.err> : Mar 11 18:18:00,hoover-VirtualBox,su[5026]:, pam_authenticate: Authentication failure
Your second option is to use a log management system. Good log management systems automatically parse syslog messages and extract the severity field. They also allow you to filter on log messages of a certain severity with a single click.
Here is a screenshot from Loggly showing the syslog fields with the error severity highlighted to show we are filtering for errors:
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.00.36-AM.png)
--------------------------------------------------------------------------------
via: http://www.loggly.com/ultimate-guide/logging/analyzing-linux-logs/
作者:[Jason Skowronski][a] [Amy Echeverri][b] [ Sadequl Hussain][c]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linkedin.com/in/jasonskowronski
[b]:https://www.linkedin.com/in/amyecheverri
[c]:https://www.linkedin.com/pub/sadequl-hussain/14/711/1a7
[1]:http://linux.die.net/man/1/grep
[2]:http://linux.die.net/man/1/tail
[3]:http://ryanstutorials.net/linuxtutorial/grep.php
[4]:https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
[5]:http://linux.die.net/man/1/cut
[6]:http://linux.die.net/man/1/awk
[7]:http://www.delorie.com/gnu/docs/gawk/gawk_26.html#IDX155
[8]:http://logstash.net/docs/1.4.2/filters/grok
[9]:http://www.rsyslog.com/doc/v8-stable/configuration/templates.html

View File

@ -0,0 +1,181 @@
Linux 日志分析
==============================================================================
日志中有大量的信息需要你处理,尽管有时候想要提取并非想象中的容易。在这篇文章中我们会介绍一些你现在就能做的基本日志分析例子。我们还将设计一些更高级的分析,但这些需要你前期努力做出切当的设置,后期就能节省很多时间。对数据进行高级分析的例子包括生成汇总计数、对有效值进行过滤,等等。
我们首先会想你展示如何在命令行中使用多个不同的工具,然后展示了一个日志管理工具如何能自动完成大部分繁重工作从而使得日志分析变得简单。
### 用 Grep 搜索 ###
搜索文本是查找信息最基本的方式。搜索文本最常用的工具是 [grep][1]。这个命令行工具在大部分 Linux 发行版中都有,它允许你用正则表达式搜索日志。一个正则表达式是用特殊的语言写的、能识别匹配文本的模式。最简单的模式就是用引号把你想要查找的字符串括起来。
#### 正则表达式 ####
这是一个在 Ubuntu 系统中查找 “user hoover” 认证日志的例子:
$ grep "user hoover" /var/log/auth.log
Accepted password for hoover from 10.0.2.2 port 4792 ssh2
pam_unix(sshd:session): session opened for user hoover by (uid=0)
pam_unix(sshd:session): session closed for user hoover
构建精确的正则表达式可能很难。例如,如果我们想要搜索一个类似端口 “4792” 的数字它可能也会匹配时间戳、URLs 以及其它不需要的数据。Ubuntu 中下面的例子,它匹配了一个我们不想要的 Apache 日志。
$ grep "4792" /var/log/auth.log
Accepted password for hoover from 10.0.2.2 port 4792 ssh2
74.91.21.46 - - [31/Mar/2015:19:44:32 +0000] "GET /scripts/samples/search?q=4972HTTP/1.0" 404 545 "-" "-”
#### 环绕搜索 ####
另一个有用的小技巧是你可以用 grep 做环绕搜索。这会向你展示一个匹配前面或后面几行是什么。它能帮助你调试导致错误或问题的东西。B 选项展示前面几行A 展示后面几行。举个例子,我们知道当一个人以管理员员身份登录失败时,他们也没有逆映射,也就意味着他们可能没有有效的域名。这非常可疑!
$ grep -B 3 -A 2 'Invalid user' /var/log/auth.log
Apr 28 17:06:20 ip-172-31-11-241 sshd[12545]: reverse mapping checking getaddrinfo for 216-19-2-8.commspeed.net [216.19.2.8] failed - POSSIBLE BREAK-IN ATTEMPT!
Apr 28 17:06:20 ip-172-31-11-241 sshd[12545]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
Apr 28 17:06:20 ip-172-31-11-241 sshd[12547]: Invalid user admin from 216.19.2.8
Apr 28 17:06:20 ip-172-31-11-241 sshd[12547]: input_userauth_request: invalid user admin [preauth]
Apr 28 17:06:20 ip-172-31-11-241 sshd[12547]: Received disconnect from 216.19.2.8: 11: Bye Bye [preauth]
#### Tail ####
你也可以把 grep 和 [tail][2] 结合使用来获取一个文件的最后几行,或者跟踪日志并实时打印。这在你做交互式更改的时候非常有用,例如启动服务器或者测试代码更改。
$ tail -f /var/log/auth.log | grep 'Invalid user'
Apr 30 19:49:48 ip-172-31-11-241 sshd[6512]: Invalid user ubnt from 219.140.64.136
Apr 30 19:49:49 ip-172-31-11-241 sshd[6514]: Invalid user admin from 219.140.64.136
关于 grep 和正则表达式的详细介绍并不在该指南的范围,但 [Ryans Tutorials][3] 有更深入的介绍。
日志管理系统有更高的性能和更强大的搜索能力。它们通常会索引数据并进行并行查询,因此你可以很快的在几秒内就能搜索 GB 或 TB 的日志。相比之下grep 就需要几分钟,在极端情况下可能甚至几小时。日志管理系统也使用类似 [Lucene][4] 的查询语言,它提供更简单的语法来检索数字、域以及其它。
### 用 Cut、 AWK、 和 Grok 解析 ###
#### 命令行工具 ####
Linux 提供了多个命令行工具用于文本解析和分析。当你想要快速解析少量数据时非常有用,但处理大量数据时可能需要很长时间。
#### Cut ####
[cut][5] 命令允许你从分隔的日志解析域。分隔符是指能分开域或键值对的等号或逗号。
假设我们想从下面的日志中解析出用户:
pam_unix(su:auth): authentication failure; logname=hoover uid=1000 euid=0 tty=/dev/pts/0 ruser=hoover rhost= user=root
我们可以像下面这样用 cut 命令获取第八个等号后面的文本。这是一个 Ubuntu 系统上的例子:
$ grep "authentication failure" /var/log/auth.log | cut -d '=' -f 8
root
hoover
root
nagios
nagios
#### AWK ####
另外,你也可以使用 [awk][6],它能提供解析域更强大的功能。它提供了一个脚本语言,你可以过滤出几乎任何不相干的东西。
例如,假设在 Ubuntu 系统中我们有下面的一行日志,我们想要提取登录失败的用户名称:
Mar 24 08:28:18 ip-172-31-11-241 sshd[32701]: input_userauth_request: invalid user guest [preauth]
你可以像下面这样使用 awk 命令。首先,用一个正则表达式 /sshd.*invalid user/ 来匹配 sshd 无效用户行。然后用 { print $9 } 根据默认的分隔符空格打印第九个域。然后就输出了用户名。
$ awk '/sshd.*invalid user/ { print $9 }' /var/log/auth.log
guest
admin
info
test
ubnt
你可以在 [Awk 用户指南][7] 中阅读更多关于如何使用正则表达式和输出域的信息。
#### 日志管理系统 ####
日志管理系统使得解析变得更加简单,使用户能快速的分析很多的日志文件。他们能自动解析标准的日志格式,比如常见的 Linux 日志和 Web 服务器日志。这能节省很多时间,因为当处理系统问题的时候你不需要考虑写自己的解析逻辑。
下面是一个 sshd 日志消息的例子,解析出了每个 remoteHost 和 user。这是 Loggly 中的一张截图,它是一个基于云的日志管理服务。
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.25.09-AM.png)
你也可以对非标准格式自定义解析。一个常用的工具是 [Grok][8],它用一个常见正则表达式库解析原始文本为结构化 JSON。下面是一个 Grok 在 Logstash 中解析内核日志文件的事例配置:
filter{
grok {
match => {"message" => "%{CISCOTIMESTAMP:timestamp} %{HOST:host} %{WORD:program}%{NOTSPACE} %{NOTSPACE}%{NUMBER:duration}%{NOTSPACE} %{GREEDYDATA:kernel_logs}"
}
}
下图是 Grok 解析后输出的结果:
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.30.37-AM.png)
### 用 Rsyslog 和 AWK 过滤 ###
过滤使得你能检索一个特定的域值而不是进行全文检索。这使你的日志分析更加准确,因为它会忽略来自其它部分日志信息不需要的匹配。为了对一个域值进行搜索,你首先需要解析日志或者至少有对事件结构进行检索的方式。
#### 如果对 App 进行过滤 ####
通常,你可能只想看一个应用的日志。如果你的应用把记录都保存到一个文件中就会很容易。如果你需要在一个聚集或集中式日志中过滤一个应用就会比较复杂。下面有几种方法来实现:
1. 用 rsyslog 守护进程解析和过滤日志。下面的例子将 sshd 应用的日志写入一个名为 sshd-message 的文件,然后丢弃事件也就不会在其它地方重复。你可以将它添加到你的 rsyslog.conf 文件中测试这个例子。
:programname, isequal, “sshd” /var/log/sshd-messages
&~
2. 用类似 awk 的命令行工具提取特定域的值,例如 sshd 用户名。下面是 Ubuntu 系统中的一个例子。
$ awk '/sshd.*invalid user/ { print $9 }' /var/log/auth.log
guest
admin
info
test
ubnt
3. 用日志管理系统自动解析日志,然后在需要的应用名称上点击过滤。下面是在 Loggly 日志管理服务中提取 syslog 域的截图。我们对应用名称 “sshd” 进行过滤,如维恩图图标所示。
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.05.02-AM.png)
#### 如何过滤错误 ####
一个人最希望看到日志中的错误。不幸的是,默认的 syslog 配置不直接输出错误的严重性,也就使得难以过滤它们。
这里有两个解决该问题的方法。首先,你可以修改你的 rsyslog 配置,在日志文件中输出错误的严重性,使得便于查看和检索。在你的 rsyslog 配置中你可以用 pri-text 添加一个 [模板][9],像下面这样:
"<%pri-text%> : %timegenerated%,%HOSTNAME%,%syslogtag%,%msg%n"
这个例子会按照下面的格式输出。你可以看到该信息中指示错误的 err。
<authpriv.err> : Mar 11 18:18:00,hoover-VirtualBox,su[5026]:, pam_authenticate: Authentication failure
你可以用 awk 或者 grep 检索错误信息。在 Ubuntu 中,对这个例子,我们可以用一些环绕语法,例如 . 和 >,它们只会匹配这个域。
$ grep '.err>' /var/log/auth.log
<authpriv.err> : Mar 11 18:18:00,hoover-VirtualBox,su[5026]:, pam_authenticate: Authentication failure
你的第二个选择是使用日志管理系统。好的日志管理系统能自动解析 syslog 消息并抽取错误域。它们也允许你用简单的点击过滤日志消息中的特定错误。
下面是 Loggly 中一个截图,显示了高亮错误严重性的 syslog 域,表示我们正在过滤错误:
![](http://www.loggly.com/ultimate-guide/wp-content/uploads/2015/05/Screen-Shot-2015-03-12-at-11.00.36-AM.png)
--------------------------------------------------------------------------------
via: http://www.loggly.com/ultimate-guide/logging/analyzing-linux-logs/
作者:[Jason Skowronski][a] [Amy Echeverri][b] [ Sadequl Hussain][c]
译者:[ictlyh](https://github.com/ictlyh)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://www.linkedin.com/in/jasonskowronski
[b]:https://www.linkedin.com/in/amyecheverri
[c]:https://www.linkedin.com/pub/sadequl-hussain/14/711/1a7
[1]:http://linux.die.net/man/1/grep
[2]:http://linux.die.net/man/1/tail
[3]:http://ryanstutorials.net/linuxtutorial/grep.php
[4]:https://lucene.apache.org/core/2_9_4/queryparsersyntax.html
[5]:http://linux.die.net/man/1/cut
[6]:http://linux.die.net/man/1/awk
[7]:http://www.delorie.com/gnu/docs/gawk/gawk_26.html#IDX155
[8]:http://logstash.net/docs/1.4.2/filters/grok
[9]:http://www.rsyslog.com/doc/v8-stable/configuration/templates.html