translated

This commit is contained in:
Morisun029 2020-01-27 18:25:12 +08:00 committed by GitHub
parent edb60f006d
commit 6056bbda6f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 185 additions and 178 deletions

View File

@ -1,178 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (Morisun029)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Use this Python script to find bugs in your Overcloud)
[#]: via: (https://opensource.com/article/20/1/logtool-root-cause-identification)
[#]: author: (Arkady Shtempler https://opensource.com/users/ashtempl)
Use this Python script to find bugs in your Overcloud
======
LogTool is a set of Python scripts that helps you investigate root
causes for problems in Overcloud nodes.
![Searching for code][1]
OpenStack stores and manages a bunch of log files on its Overcloud nodes and Undercloud host. Therefore, it's not easy to use OSP log files to investigate a problem you're having, especially when you don't even know what could have caused the problem.
If that's your situation, [LogTool][2] makes your life much easier! It saves you the time and work it would otherwise take to investigate the root cause manually. Based on a fuzzy string matching algorithm, LogTool provides all the unique error and warning messages that have occurred in the past. You can export these messages for a particular time period, such as 10 minutes ago, an hour ago, a day ago, and so on, based on timestamp in the log.
LogTool is a set of Python scripts, and its main module, **PyTool.py**, is executed on the Undercloud host. Some operation modes use additional scripts that are executed directly on Overcloud nodes, such as exporting  errors and warnings from Overcloud logs.
LogTool supports Python 2 and 3, and you can change the working directory according to your needs: [LogTool_Python2][3] or [LogTool_Python3][4].
### Operation modes
#### 1\. Export errors and warnings from Overcloud logs
This mode is used to extract all unique **ERROR** and **WARNING** messages from Overcloud nodes that took place in the past. As the user, you're prompted to provide the "since time" and debug level to be used for extraction of errors or warnings. For example, if something went wrong in the last 10 minutes, you're be able to extract error and warning messages for just that time period.
This operation mode generates a directory containing a result file for each Overcloud node. A result file is a simple text file that is compressed (***.gz**) to reduce the time needed to download it from the Overcloud node. To convert a compressed file to a regular text file, you can use [zcat][5] or a similar tool. Also, some versions of Vi and any recent version of Emacs both support reading compressed data. The result file is divided into sections and contains a table of contents at the bottom.
There are two kinds of log files LogTool detects on the fly: _Standard_ and _Not Standard_. In _Standard_, each log line has a known and defined structure: timestamp, debug level, msg, and so on. In _Not Standard_, the log's structure is unknown; it could be a third party's logs, for example. In the table of contents, you find a "Section name --> Line number" per section, for example:
* **Raw Data - extracted Errors/Warnings from standard OSP logs since:** This section contains all extracted Error/Warning messages as-is without any modifications or changes. These messages are the raw data LogTool uses for fuzzy matching analysis.
* **Statistics - Number of Errors/Warnings per standard OSP log since:** In this section, you find the amount of Errors and Warnings per Standard log file. This may help you understand potential components used to search for the root cause of your issue.
* **Statistics - Unique messages, per STANDARD OSP log file since:** This section addresses unique Error and Warning messages since a timestamp you provide. For more details about each unique Error or Warning, search for the same message in the Raw Data section.
* **Statistics - Unique messages per NON STANDARD log file, since any time:** This section contains the unique messages in nonstandard log files. Unfortunately, LogTool cannot handle these log files in the same manner as Standard Log files; therefore, the "since time" you provide on extraction will be ignored, and you'll see all of the unique Errors/Warnings messages ever created. So first, scroll down to the table of contents at the bottom of the result file and review its sections—use the line indexes in the table of contents to jump to the relevant sections, where numbers 3, 4, and 5 are most important.
#### 2\. Download all logs from Overcloud nodes
Logs from all Overcloud nodes are compressed and downloaded to a local directory on your Undercloud host.
#### 3\. Grep for a string in all Overcloud logs
This mode "greps" (searches) a string provided by the user on all Overcloud logs. For example, you might want to see all logged messages for a specific request ID, such as the request ID for a "Create VM" that has failed.
#### 4\. Check current CPU,RAM and Disk on Overcloud
This mode displays the current CPU, RAM, and disk info on each Overcloud node.
#### 5\. Execute user's script
This enables users to run their own scripts on Overcloud nodes. For instance, say an Overcloud deployment failed, so you need to execute the same procedure on each Controller node to fix that. You can implement a "work around" script and to run it on Controllers using this mode.
#### 6\. Download relevant logs only, by given timestamp
This mode downloads only the Overcloud logs with _"Last Modified" > "given by user timestamp."_ For example, if you got an error 10 minutes ago, old log files won't be relevant, so downloading them is unnecessary. In addition, you can't (or shouldn't)  attach large files in some bug reporting tools, so this mode might help with making bug reports.
#### 7\. Export errors and warnings from Undercloud logs
This is the same as mode #1 above, but for Undercloud logs.
#### 8\. Check Unhealthy dockers on the Overcloud
This mode is used to search for unhealthy Dockers on nodes.
#### 9\. Download OSP logs and run LogTool locally
This mode allows you to download OSP logs from Jenkins or Log Storage (for example, **cougar11.scl.lab.tlv.redhat.com**) and to analyze the downloaded logs locally.
#### 10\. Analyze deployment log on the Undercloud
This mode may help you understand what went wrong during Overcloud or Undercloud deployment. Deployment logs are generated when the **\--log** option is used, for example, inside the **overcloud_deploy.sh** script; the problem is that such logs are not "friendly," and it's hard to understand what went wrong, especially when verbosity is set to **vv** or more, as this makes the log unreadable with a bunch of data inside it. This mode provides some details about all failed tasks.
#### 11\. Analyze Gerrit(Zuul) failed gate logs
This mode is used to analyze Gerrit(Zuul) log files. It automatically downloads all files from a remote Gerrit gate (HTTP download) and analyzes all files locally.
### Installation
LogTool is available on GitHub. Clone it to your Undercloud host with:
```
`git clone https://github.com/zahlabut/LogTool.git`
```
Some external Python modules are also used by the tool:
#### Paramiko
This SSH module is usually installed on Undercloud by default. Use the following command to verify whether it's installed:
```
`ls -a /usr/lib/python2.7/site-packages | grep paramiko`
```
If you need to install the module, on your Undercloud, execute the following commands:
```
sudo easy_install pip
sudo pip install paramiko==2.1.1
```
#### BeautifulSoup
This HTML parser module is used only in modes where log files are downloaded using HTTP. It's used to parse the Artifacts HTML page to get all of the links in it. To install BeautifulSoup, enter this command:
```
`pip install beautifulsoup4`
```
You can also use the [requirements.txt][6] file to install all the required modules by executing:
```
`pip install -r requirements.txt`
```
### Configuration
All required parameters are set directly inside the **PyTool.py** script. The defaults are:
```
overcloud_logs_dir = '/var/log/containers'
overcloud_ssh_user = 'heat-admin'
overcloud_ssh_key = '/home/stack/.ssh/id_rsa'
undercloud_logs_dir ='/var/log/containers'
source_rc_file_path='/home/stack/'
```
### Usage
This tool is interactive, so to start it, just enter:
```
cd LogTool
python PyTool.py
```
### Troubleshooting LogTool
Two log files are created on runtime: Error.log and Runtime.log*.* Please add the contents of both in the description of the issue you'd like to open.
### Limitations
LogTool is hardcoded to handle files up to 500 MB.
### LogTool_Python3 script
Get it at [github.com/zahlabut/LogTool][2]
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/1/logtool-root-cause-identification
作者:[Arkady Shtempler][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/ashtempl
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/search_find_code_python_programming.png?itok=ynSL8XRV (Searching for code)
[2]: https://github.com/zahlabut/LogTool
[3]: https://github.com/zahlabut/LogTool/tree/master/LogTool_Python2
[4]: https://github.com/zahlabut/LogTool/tree/master/LogTool_Python3
[5]: https://opensource.com/article/19/2/getting-started-cat-command
[6]: https://github.com/zahlabut/LogTool/blob/master/LogTool_Python3/requirements.txt

View File

@ -0,0 +1,185 @@
[#]: collector: (lujun9972)
[#]: translator: (Morisun029)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Use this Python script to find bugs in your Overcloud)
[#]: via: (https://opensource.com/article/20/1/logtool-root-cause-identification)
[#]: author: (Arkady Shtempler https://opensource.com/users/ashtempl)
用 Python 脚本发现 Overcloud 中的问题
======
LogTool 是一组 Python 脚本,可帮助你找出 Overcloud 节点中问题的根本原因。
![Searching for code][1]
OpenStack 在其 Overcloud 节点和 Undercloud 主机上存储和管理了一堆日志文件。因此,使用 OSP 日志文件来排查遇到的问题并不是一件容易的事,尤其在你甚至都不知道是什么原因导致问题时。
如果你正处于这种情况,那么 [LogTool][2] 可以使你的生活变得更加轻松!它会为你节省本需要人工排查问题所需的时间和精力。 LogTool 基于模糊字符串匹配算法可提供过去发生的所有唯一错误和警告信息。你可以根据日志中的时间戳导出特定时间段例如10分钟前一个小时前一天前等的这些信息。
LogTool 是一组 Python 脚本,其主要模块 PyTool.py 在 Undercloud 主机上执行。某些操作模式使用直接在Overcloud 节点上执行的其他脚本,例如从 Overcloud 日志中导出错误和警告信息。
LogTool 支持 Python 2 和 Python 3你可以根据需要更改工作目录[LogTool_Python2][3] or [LogTool_Python3][4].
### 操作方式
#### 1\. 从 Overcloud 日志中导出错误和警告信息
此模式用于从过去发生的 Overcloud 节点中提取 **错误****警告** 信息。作为用户系统将提示你“开始时间”和“调试级别”以用于提取错误或警告消息。例如如果在过去10分钟内出了问题你则可以只提取该时间段内的错误和警告消息。
此操作模式将为每个 Overcloud 节点生成一个包含结果文件的目录。结果文件是经过压缩(***.gz**)的简单文本文件,以减少从 Overcloud 节点下载所需的时间。将压缩文件转换为常规文本文件,可以使用 zcat 或类似工具。此外Vi 的某些版本和 Emacs 的任何最新版本均支持读取压缩数据。结果文件分为几部分,并在底部包含目录。
LogTool 可以即时检测两种日志文件:标准和非标准。在标准文件中,每条日志行都有一个已知的和已定义的结构:时间戳,调试级别,信息等等。在非标准文件中,日志的结构未知。例如,它可能是第三方的日志。在目录中,你可以找到每个部分的"名称 --> 行号" 例如:
* **原始数据 - 从标准 OSP 日志中提取的错误/警告消息:** 这部分包含所有提取的错误/警告消息,没有任何修改或更改。这些消息是 LogTool 用于模糊匹配分析的原始数据。
* **统计信息 - 每个标准 OSP 日志的错误/警告信息数量:** 在此部分,你将找到每个标准日志文件的错误和警告数量。这些信息可以帮助你了解用于排查问题根本原因的潜在组件。
* **统计信息 - 每个标准 OSP 日志文件的唯一消息:** 这部分提供指定时间戳内的唯一的错误和警告消息。有关每个唯一错误或警告的更多详细信息,请在“原始数据”部分中查找相同的消息。
* **统计信息 - 每个非标准日志文件在任意时间的唯一消息:** 此部分包含非标准日志文件中的唯一消息。遗憾的是LogTool 无法像标准日志文件那样的处理方式处理这些日志文件。因此,在你提取“特定时间”的日志信息时会被忽略,你会看到过去创建的所有唯一的错误/警告消息。因此,首先,向下滚动到结果文件底部的目录并查看其部分-使用目录中的行索引跳到相关部分其中第3、4和5行的信息最重要。
#### 2\. 从 Overcloud 节点下载所有日志
所有 Overcloud 节点的日志将被压缩并下载到 Undercloud 主机上的本地目录。
#### 3\. 所有 Overcloud 日志中使用 Grep 搜索字符串
该模式中使用“greps”来搜索由用户在所有 Overcloud 日志上提供的字符串。例如你可能希望查看特定请求的所有日志消息例如“创建VM”的失败的请求ID。
#### 4\. 检查 Overcloud 上当前的 CPURAM 和磁盘使用情况
该模式显示每个 Overcloud 节点上的当前 CPURAM 和磁盘信息。
#### 5\. 执行用户脚本
该模式使用户可以在 Overcloud 节点上运行自己的脚本。例如,假设 Overcloud 部署失败你就需要在每个Controller 节点上执行相同的过程来修复该问题。你可以实现“替代方法”脚本,并使用此模式在 Controller 上运行它。
#### 6\. 仅按给定的时间戳下载相关日志
此模式仅下载 Overcloud 上 “给定的时间戳”的“上次修改时间”的日志。例如如果10分钟前出现错误则与旧日志文件就没有关系因此无需下载。此外你不能或不应在某些错误报告工具中附加大文件因此此模式可能有助于编写错误报告。
#### 7\. 从 Undercloud 日志中导出错误和警告信息
这与上面的模式1相同。
#### 8\. 在 Overcloud 上检查不健康的 docker
此模式用于在节点上搜索不正常的 Docker。
#### 9\. 下载 OSP 日志并在本地运行 LogTool
此模式允许你从 Jenkins 或 Log Storage 下载 OSP 日志 (例如, **cougar11.scl.lab.tlv.redhat.com**),并在本地分析。
#### 10\. 在 Undercloud 上分析部署日志
此模式可以帮助你了解 Overcloud 或 Undercloud 部署过程中出了什么问题。例如,在**overcloud_deploy.sh** 脚本中,使用 **\--log**选项时会生成部署日志;此类日志的问题是“不友好”,你很难理解是什么出了问题,尤其是当详细程度设置为**vv** 或更高时,使得日志中的数据难以读取。此模式提供有关所有失败任务的详细信息。
#### 11\. 分析 GerritZuul失败的日志
此模式用于分析 GerritZuul日志文件。它会自动从远程 Gerrit 门下载所有文件HTTP下载并在本地进行分析。
### 安装
GitHub 上有 LogTool使用以下命令将其克隆到你的 Undercloud 主机:
```
`git clone https://github.com/zahlabut/LogTool.git`
```
该工具还使用了一些外部 Python 模块:
#### Paramiko
默认情况下SSH 模块通常会安装在 Undercloud 上。使用以下命令来验证是否已安装:
```
`ls -a /usr/lib/python2.7/site-packages | grep paramiko`
```
如果需要安装模块,请在 Undercloud 上执行以下命令:
```
sudo easy_install pip
sudo pip install paramiko==2.1.1
```
#### BeautifulSoup
此 HTML 解析器模块仅在使用 HTTP 下载日志文件的模式下使用。它用于解析 Artifacts HTML 页面以获取其中的所有链接。安装 BeautifulSoup请输入以下命令
```
`pip install beautifulsoup4`
```
你还可以通过执行以下命令使用[requirements.txt][6]文件安装所有必需的模块:
```
`pip install -r requirements.txt`
```
### 配置
所有必需的参数都直接在**PyTool.py**脚本中设置。默认值为:
```
overcloud_logs_dir = '/var/log/containers'
overcloud_ssh_user = 'heat-admin'
overcloud_ssh_key = '/home/stack/.ssh/id_rsa'
undercloud_logs_dir ='/var/log/containers'
source_rc_file_path='/home/stack/'
```
### 用法
此工具是交互式的,因此要启动它,只需输入:
```
cd LogTool
python PyTool.py
```
### 排除 LogTool 故障
在运行时会创建两个日志文件Error.log 和 Runtime.log*.* 请在你要打开的问题的描述中添加两者的内容。
### 局限性
LogTool 进行硬编码以处理最大500 MB 的文件。
### LogTool_Python3 脚本
在 [github.com/zahlabut/LogTool][2] 获取。
--------------------------------------------------------------------------------
via: https://opensource.com/article/20/1/logtool-root-cause-identification
作者:[Arkady Shtempler][a]
选题:[lujun9972][b]
译者:[Morisun029](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/ashtempl
[b]: https://github.com/lujun9972
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/search_find_code_python_programming.png?itok=ynSL8XRV (Searching for code)
[2]: https://github.com/zahlabut/LogTool
[3]: https://github.com/zahlabut/LogTool/tree/master/LogTool_Python2
[4]: https://github.com/zahlabut/LogTool/tree/master/LogTool_Python3
[5]: https://opensource.com/article/19/2/getting-started-cat-command
[6]: https://github.com/zahlabut/LogTool/blob/master/LogTool_Python3/requirements.txt