[bazz2-ed]Linux Kernel Testing and Debugging 3

This commit is contained in:
bazz2 2014-07-25 12:39:41 +08:00
parent 9905321cb9
commit 4556af709e
3 changed files with 126 additions and 127 deletions

View File

@ -1,126 +0,0 @@
[bazz2 on the way]
Linux Kernel Testing and Debugging
================================================================================
### Basic Testing ###
Once a new kernel is installed, the next step is try to boot it and see what happens. Once the new kernel is up and running, check dmesg for any regressions. Run a few usage tests:
- Is networking (wifi or wired) functional?
- Does ssh work?
- Run rsync of a large file over ssh
- Run git clone and git pull
- Start web browser
- Read email
- Download files: ftp, wget etc.
- Play audio/video files
- Connect new USB devices mouse, usb stick etc.
### Examine Kernel Logs ###
Checking for regressions in dmesg is a good way to identify problems, if any, introduced by the new code. As a general rule, there should be no new crit, alert, and emerg level messages in dmesg. There should be no new err level messages. Pay close attention to any new warn level messages as well. lease note that new warn messages aren't as bad. New code at times adds new warning messages which are just warnings.
- dmesg -t -l emerg
- dmesg -t -l crit
- dmesg -t -l alert
- dmesg -t -l err
- dmesg -t -l warn
- dmesg -t -k
- dmesg -t
The following script runs the above dmesg commands and saves the output for comparing with older release dmesg files. It then runs diff commands against the older release dmesg files. Old release is a required input parameter. If one is not supplied, it will simply generate dmesg files and exit. Regressions indicate newly introduced bugs and/or bugs that escaped patch testing and integration testing in linux git trees prior to including the patch in a release. Are there any stack traces resulting from WARN_ON in the dmesg? These are serious problems that require further investigation.
- [**dmesg regression check script**][1]
### Stress Testing ###
Running 3 to 4 kernel compiles in parallel is a good overall stress test. Download a few Linux kernel gits, stable, linux-next etc.. Run timed compiles in parallel. Compare times with old runs of this test for regressions in performance. Longer compile times could be indicators of performance regression in one of the kernel modules. Performance problems are hard to debug. First step is to detect them. Running several compiles in parallel is a good overall stress test that could be used as a performance regression test and overall kernel regression test, as it exercises various kernel modules like memory, file-systems, dma, and drivers.
time make all
### Kernel Testing Tools ###
There are several tests under tools/testing that are included in the Linux kernel git. There is a good mix of automated and functional tests.
ktest suite
ktest is an automated test suite that can test builds, installs, and kernel boots. It can also run cross-compile tests provided the system has cross-compilers installed. ktest depends on flex and bison tools. Please consult the ktest documentation in tools/testing/ktest for details on how to run ktest. It is left to the reader as a self-study. A few resources that go into detail on how to run ktest:
- [**ktest-eLinux.org**][2]
### tools/testing/selftests ###
Let's start with selftests. Kernel sources include a set of self-tests which test various sub-systems. As of this writing, breakpoints, cpu-hotplug, efivarfs, ipc, kcmp, memory-hotplug, mqueue, net, powerpc, ptrace, rcutorture, timers, and vm sub-systems have self-tests. In addition to these, user memory self-tests test user memory to kernel memory copies via test_user_copy module. The following is on how to run these self-tests:
Compile tests:
make -C tools/testing/selftests
Run all tests: (running some tests needs root access, login as root and run)
make -C tools/testing/selftests run_tests
Run only tests targeted for a single sub-system:
make -C tools/testing/selftests TARGETS=vm run_tests
### tools/testing/fault-injection ###
Another test suite under tools/testing is fault-injection. failcmd.sh script runs a command to inject slab and page allocation failures. This type of testing helps validate how well kernel can recover from faults. This test should be run as root. The following is a quick summary of currently implemented fault injection capabilities. The list keeps growing as new fault injection capabilities get added. Please refer to the Documentation/fault-injection/fault-injection.txt for the latest.
failslab (default option)
injects slab allocation failures. kmalloc(), kmem_cache_alloc(), ...
fail_page_alloc
injects page allocation failures. alloc_pages(), get_free_pages(), ...
fail_make_request
injects disk IO errors on devices permitted by setting, /sys/block//make-it-fail or /sys/block///make-it-fail. (generic_make_request())
fail_mmc_request
injects MMC data errors on devices permitted by setting debugfs entries under /sys/kernel/debug/mmc0/fail_mmc_request
The capabilities and behavior of fault-injection can be configured. fault-inject-debugfs kernel module provides some debugfs entries for runtime. Ability to specify the error probability rate for faults, the interval between fault injection are just a couple of examples of the configuration choices fault-injection test supports. Please refer to the Documentation/fault-injection/fault-injection.txt for details. Boot options can be used to inject faults during early boot before debugfs becomes available. The following boot options are supported:
- failslab=
- fail_page_alloc=
- fail_make_request=
- mmc_core.fail_request=[interval],[probability],[space],[times]
The fault-injection infrastructure provides interfaces to add new fault-injection capabilities. The following is a brief outline of the steps involved in adding a new capability. Please refer to the above mentioned document for details:
define the fault attributes using DECLARE_FAULT_INJECTION(name);
> Please see the definition of struct fault_attr in fault-inject.h for details.
add a boot option to configure fault attributes
> This can be done using helper function setup_fault_attr(attr, str); Adding a boot option is necessary to enable the fault injection capability during early boot time.
add debugfs entries
> Use the helper function fault_create_debugfs_attr(name, parent, attr); to add new debugfs entries for this new capability.
add module parameters
> Adding module parameters to configure the fault attributes is a good option, when the scope of the new fault capability is limited to a single kernel module.
add a hook to insert failures
> should_fail(attr, size); Upon should_fail() returning true, client code should inject a failure.
Applications using this fault-injection infrastructure can target a specific kernel module to inject slab and page allocation failures to limit the testing scope if need be.
--------------------------------------------------------------------------------
via: http://www.linuxjournal.com/content/linux-kernel-testing-and-debugging?page=0,2
译者:[译者ID](https://github.com/译者ID) 校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[1]:http://linuxdriverproject.org/mediawiki/index.php/Dmesg_regression_check_script
[2]:http://elinux.org/Ktest#Git_Bisect_type

View File

@ -1,3 +1,4 @@
[bazz2 keep moving]
Linux Kernel Testing and Debugging
================================================================================
### Auto Testing Tools ###
@ -140,4 +141,4 @@ via:http://www.linuxjournal.com/content/linux-kernel-testing-and-debugging?page=
[1]:http://autotest.github.io/
[2]:https://github.com/autotest/autotest/wiki/WhitePaper
[3]:http://events.linuxfoundation.org/sites/events/files/slides/Shuah_Khan_dma_map_error.pdf
[4]:http://www.linuxjournal.com/content/july-2013-linux-kernel-news
[4]:http://www.linuxjournal.com/content/july-2013-linux-kernel-news

View File

@ -0,0 +1,124 @@
Linux 内核测试与调试 - 3
================================================================================
### 基本测试 ###
安装好内核后,试试能不能启动它。能启动的话,检查 dmesg 看看有没有隐藏的错误。试试下面的功能:
- 网络Wifi 或者网线)是否可用?
- ssh 是否可用?
- 使用 ssh 远程传输文件。
- 使用 git clone 和 git pull 命令。
- 用用网络浏览器。
- 查看 email。
- 使用 ftp, wget 等软件下载文件。
- 播放音频视频文件。
- 连上 USB 鼠标等设备。
### 检查内核日志 ###
使用 dmesg 查看隐藏的问题,对于定位新代码带来的 bug 是一个好方法。一般来说dmesg 不会输出新的 crit, alert, emerg 级别的错误信息,也不应该出现新的 err 级别的信息。你要注意的是那些 warn 级别的日志信息。请注意 warn 这个级别的信息并不是坏消息,新代码带来新的警告信息,不会给内核带去严重的影响。
- dmesg -t -l emerg
- dmesg -t -l crit
- dmesg -t -l alert
- dmesg -t -l err
- dmesg -t -l warn
- dmesg -t -k
- dmesg -t
下面的脚本运行了上面的命令,并且将输出保存起来,以便与老的内核的 dmesg 输出作比较LCTT老内核的 dmesg 输出在本系列的第一篇文章中有介绍)。然后运行 diff 命令,查看新老内核 dmesg 日志之间的不同。脚本需要输入老内核版本号,如果不输入参数,它只会生成新内核的 dmesg 日志文件后直接退出不再作比较LCTT话是这么说没错但点开脚本一看没输参数的话这货会直接退出连新内核的 dmesg 日志也不会保存的)。如果 dmesg 日志有新的警告信息表示新发布的内核有漏网之鱼LCTT漏网之 bug 会更好理解些么?),这些 bug 逃过了自测和系统测试。你要看看,那些警告信息后面有没有栈跟踪信息?也许这里有很多问题需要你进一步调查分析。
- [**dmesg 测试脚本**][1]
### 压力测试 ###
执行压力测试的一个好办法是同时跑三四个内核编译任务。下载各种版本的内核同时编译它们并记录时间。比较新内核跑压力测试和老内核跑压力测试所花的时间然后可以定位新内核的性能。如果新内核跑压力测试的时间比老内核的更长说明新内核的部分模块性能退步了。性能问题很难调试出来。第一步是找出哪里导致的性能退步。同时跑多个内核编译任务对检测内核整体性能来说是个好方法但是这种方法涵盖了多个内核模块比如内存管理、文件系统、DMA、驱动等LCTT也就是说这种压力测试没办法定位到是哪个模块造成了性能的下降
time make all
### 内核测试工具 ###
我们可以在 Linux 内核本身找到多种测试方法。下面介绍一个很好用的功能测试工具集: ktest 套件
ktest 是一个自动测试套件它可以提供编译安装启动内核一条龙测试服务也可以跑交叉编译测试前提是你的系统有安装交叉编译所需要的软件。ktest 依赖于 flex 和 bison。详细信息请参考放在 tools/testing/ktest 目录下的文档,你可以自学成材。另外还有一些参考资料教你怎么使用 ktest
- [**ktest-eLinux.org**][2]
### tools/testing/selftests 套件 ###
我们来玩玩自测吧。内核源码的多个子系统都有自己的自测工具到目前为止断点、cpu热插拔、efivarfs、IPC、KCMP、内存热插拔、mqueue、网络、powerpc、ptrace、rcutorture、定时器和虚拟机子系统都有自测工具。另外用户态内存的自测工具可以利用 test_user_copy 模块来测试用户态内存到内核态的拷贝过程。下面的命令演示了如何使用这些测试工具:
编译测试:
make -C tools/testing/selftests
测试全部:(有些测试需要 root 权限,你需要以 root 用户登入系统然后运行命令)
make -C tools/testing/selftests run_tests
只测试单个子系统:
make -C tools/testing/selftests TARGETS=vm run_tests
### tools/testing/fault-injection 套件 ###
在 tools/testing 目录下的另一个测试套件是 fault-injection。failcmd.sh 脚本用于检测 slab 和内存页分配器的错误。这些工具可以测试内核能否很好地从错误状态中恢复回来。这些测试需要用到 root 权限。下面简单介绍了一些当前能提供的错误检测方法。随着错误检测方法的增加,这份名单也会不断增长。最新的名单请参考 Documentation/fault-injection/fault-injection.txt 文档。
failslab (默认选项)
产生 slab 分配错误。作用于 kmalloc(), kmem_cache_alloc() 等函数LCTT产生的结果是调用这些函数就会返回失败可以模拟程序分不到内存时是否还能稳定运行下去
fail_page_alloc
产生内存页分配的错误。作用于 alloc_pages(), get_free_pages() 等函数LCTT同上调用这些函数返回错误
fail_make_request
对满足条件(可以设置 /sys/block//make-it-fail 或 /sys/block///make-it-fail 文件)的磁盘产生 IO 错误,作用于 generic_make_request() 函数LCTT所有针对这块磁盘的读或写请求都会出错
fail_mmc_request
对满足条件(可以设置 /sys/kernel/debug/mmc0/fail_mmc_request 这个 debugfs 属性)的磁盘产生 MMC 数据错误。
你可以自己配置 fault-injection 套件的功能。fault-inject-debugfs 内核模块在系统运行时会在 debugfs 文件系统下面提供一些属性文件。你可以指定出错的概率,指定两个错误之间的时间间隔,当然本套件还能提供更多其他功能,具体请查看 Documentation/fault-injection/fault-injection.txt。 Boot 选项可以让你的系统在 debugfs 文件系统起来之前就可以产生错误,下面列出几个 boot 选项:
- failslab=
- fail_page_alloc=
- fail_make_request=
- mmc_core.fail_request=[interval],[probability],[space],[times]
fault-injection 套件提供接口,以便增加新的功能。下面简单介绍下增加新功能的步骤,详细信息请参考上面提到过的文档:
使用 DECLARE_FAULT_INJECTION(name) 定义默认属性;
> 详细信息可查看 fault-inject.h 中定义的 struct fault_attr 结构体。
配置 fault 属性,新建一个 boot 选项;
> 这步可以使用 setup_fault_attr(attr, str) 函数完成,为了能在系统启动的早期产生错误,添加一个 boot 选项这一步是必须要有的。
添加 debugfs 属性;
> 使用 fault_create_debugfs_attr(name, parent, attr) 函数,为新功能添加新的 debugfs 属性。
为模块设置参数;
> 为模块添加一些参数对于配置错误属性来说是一个好主意特别是当新功能的应用范围受限于单个内核模块的时候LCTT不同内核你的新功能可能需要不同的测试参数通过设置参数你的功能可以不必为了迎合不同内核而每次都重新编译一遍
添加一个钩子函数到错误测试的代码中。
> should_fail(attr, size) —— 当这个钩子函数返回 true 时,用户的代码就应该产生一个错误。
应用程序使用这个 fault-injection 套件可以指定某个具体的内核模块产生 slab 和内存页分配的错误,这样就可以缩小性能测试的范围。
--------------------------------------------------------------------------------
via: http://www.linuxjournal.com/content/linux-kernel-testing-and-debugging?page=0,2
译者:[bazz2](https://github.com/bazz2) 校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
[1]:http://linuxdriverproject.org/mediawiki/index.php/Dmesg_regression_check_script
[2]:http://elinux.org/Ktest#Git_Bisect_type