Merge pull request #22684 from YungeG/patch-1

translated
This commit is contained in:
Xingyu.Wang 2021-07-27 08:01:13 +08:00 committed by GitHub
commit e67bf97fd5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 210 additions and 210 deletions

View File

@ -1,210 +0,0 @@
[#]: collector: (lujun9972)
[#]: translator: (YungeG)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Trace code in Fedora with bpftrace)
[#]: via: (https://fedoramagazine.org/trace-code-in-fedora-with-bpftrace/)
[#]: author: (Augusto Caringi https://fedoramagazine.org/author/acaringi/)
Trace code in Fedora with bpftrace
======
![][1]
bpftrace is [a new eBPF-based tracing tool][2] that was first included in Fedora 28. It was developed by Brendan Gregg, Alastair Robertson and Matheus Marchini with the help of a loosely-knit team of hackers across the Net. A tracing tool lets you analyze what a system is doing behind the curtain. It tells you which functions in code are being called, with which arguments, how many times, and so on.
This article covers some basics about bpftrace, and how it works. Read on for more information and some useful examples.
### eBPF (extended Berkeley Packet Filter)
[eBPF][3] is a tiny virtual machine, or a virtual CPU to be more precise, in the Linux Kernel. The eBPF can load and run small programs in a safe and controlled way in kernel space. This makes it safer to use, even in production systems. This virtual machine has its own instruction set architecture ([ISA][4]) resembling a subset of modern processor architectures. The ISA makes it easy to translate those programs to the real hardware. The kernel performs just-in-time translation to native code for main architectures to improve the performance.
The eBPF virtual machine allows the kernel to be extended programmatically. Nowadays several kernel subsystems take advantage of this new powerful Linux Kernel capability. Examples include networking, seccomp, tracing, and more. The main idea is to attach eBPF programs into specific code points, and thereby extend the original kernel behavior.
eBPF machine language is very powerful. But writing code directly in it is extremely painful, because its a low level language. This is where bpftrace comes in. It provides a high-level language to write eBPF tracing scripts. The tool then translates these scripts to eBPF with the help of clang/LLVM libraries, and then attached to the specified code points.
## Installation and quick start
To install bpftrace, run the following command in a terminal [using][5] _[sudo][5]_:
```
$ sudo dnf install bpftrace
```
Try it out with a “hello world” example:
```
$ sudo bpftrace -e 'BEGIN { printf("hello world\n"); }'
```
Note that you must run _bpftrace_ as _root_ due to the privileges required. Use the _-e_ option to specify a program, and to construct the so-called “one-liners.” This example only prints _hello world_, and then waits for you to press **Ctrl+C**.
_BEGIN_ is a special probe name that fires only once at the beginning of execution. Every action inside the curly braces _{ }_ fires whenever the probe is hit — in this case, its just a _printf_.
Lets jump now to a more useful example:
```
$ sudo bpftrace -e 't:syscalls:sys_enter_execve { printf("%s called %s\n", comm, str(args->filename)); }'
```
This example prints the parent process name _(comm)_ and the name of every new process being created in the system. _t:syscalls:sys_enter_execve_ is a kernel tracepoint. Its a shorthand for _tracepoint:syscalls:sys_enter_execve_, but both forms can be used. The next section shows you how to list all available tracepoints.
_comm_ is a bpftrace builtin that represents the process name. _filename_ is a field of the _t:syscalls:sys_enter_execve_ tracepoint. You can access these fields through the _args_ builtin.
All available fields of the tracepoint can be listed with this command:
```
bpftrace -lv "t:syscalls:sys_enter_execve"
```
## Example usage
### Listing probes
A central concept for _bpftrace_ are **probe points**. Probe points are instrumentation points in code (kernel or userspace) where eBPF programs can be attached. They fit into the following categories:
* _kprobe_ kernel function start
* _kretprobe_ kernel function return
* _uprobe_ user-level function start
* _uretprobe_ user-level function return
* _tracepoint_ kernel static tracepoints
* _usdt_ user-level static tracepoints
* _profile_ timed sampling
* _interval_ timed output
* _software_ kernel software events
* _hardware_ processor-level events
All available _kprobe/kretprobe_, _tracepoints_, _software_ and _hardware_ probes can be listed with this command:
```
$ sudo bpftrace -l
```
The _uprobe/uretprobe_ and _usdt_ probes are userspace probes specific to a given executable. To use them, use the special syntax shown later in this article.
The _profile_ and _interval_ probes fire at fixed time intervals. Fixed time intervals are not covered in this article.
### Counting system calls
**Maps** are special BPF data types that store counts, statistics, and histograms. You can use maps to summarize how many times each syscall is being called:
```
$ sudo bpftrace -e 't:syscalls:sys_enter_* { @[probe] = count(); }'
```
Some probe types allow wildcards to match multiple probes. You can also specify multiple attach points for an action block using a comma separated list. In this example, the action block attaches to all tracepoints whose name starts with _t:syscalls:sys_enter__, which means all available syscalls.
The bpftrace builtin function _count()_ counts the number of times this function is called. _@[]_ represents a map (an associative array). The key of this map is _probe_, which is another bpftrace builtin that represents the full probe name.
Here, the same action block is attached to every syscall. Then, each time a syscall is called the map will be updated, and the entry is incremented in the map relative to this same syscall. When the program terminates, it automatically prints out all declared maps.
This example counts the syscalls called globally, its also possible to filter for a specific process by _PID_ using the bpftrace filter syntax:
```
$ sudo bpftrace -e 't:syscalls:sys_enter_* / pid == 1234 / { @[probe] = count(); }'
```
### Write bytes by process
Using these concepts, lets analyze how many bytes each process is writing:
```
$ sudo bpftrace -e 't:syscalls:sys_exit_write /args->ret > 0/ { @[comm] = sum(args->ret); }'
```
_bpftrace_ attaches the action block to the write syscall return probe (_t:syscalls:sys_exit_write_). Then, it uses a filter to discard the negative values, which are error codes _(/args->ret > 0/)_.
The map key _comm_ represents the process name that called the syscall. The _sum()_ builtin function accumulates the number of bytes written for each map entry or process. _args_ is a bpftrace builtin to access tracepoints arguments and return values. Finally, if successful, the _write_ syscall returns the number of written bytes. _args->ret_ provides access to the bytes.
### Read size distribution by process (histogram):
_bpftrace_ supports the creation of histograms. Lets analyze one example that creates a histogram of the _read_ size distribution by process:
```
$ sudo bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'
```
Histograms are BPF maps, so they must always be attributed to a map (_@_). In this example, the map key is _comm_.
The example makes _bpftrace_ generate one histogram for every process that calls the _read_ syscall. To generate just one global histogram, attribute the _hist()_ function just to _@_ (without any key).
bpftrace automatically prints out declared histograms when the program terminates. The value used as base for the histogram creation is the number of read bytes, found through _args->ret_.
### Tracing userspace programs
You can also trace userspace programs with _uprobes/uretprobes_ and _USDT_ (User-level Statically Defined Tracing). The next example uses a _uretprobe_, which probes to the end of a user-level function. It gets the command lines issued in every _bash_ running in the system:
```
$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: \"%s\"\n", str(retval)); }'
```
To list all available _uprobes/uretprobes_ of the _bash_ executable, run this command:
```
$ sudo bpftrace -l "uprobe:/bin/bash"
```
_uprobe_ instruments the beginning of a user-level functions execution, and _uretprobe_ instruments the end (its return). _readline()_ is a function of _/bin/bash_, and it returns the typed command line. _retval_ is the return value for the instrumented function, and can only be accessed on _uretprobe_.
When using _uprobes_, you can access arguments with _arg0..argN_. A _str()_ call is necessary to turn the _char *_ pointer to a _string_.
## Shipped Scripts
There are many useful scripts shipped with bpftrace package. You can find them in the _/usr/share/bpftrace/tools/_ directory.
Among them, you can find:
* _killsnoop.bt_ Trace signals issued by the kill() syscall.
* _tcpconnect.bt_ Trace all TCP network connections.
* _pidpersec.bt_ Count new procesess (via fork) per second.
* _opensnoop.bt_ Trace open() syscalls.
* _vfsstat.bt_ Count some VFS calls, with per-second summaries.
You can directly use the scripts. For example:
```
$ sudo /usr/share/bpftrace/tools/killsnoop.bt
```
You can also study these scripts as you create new tools.
## Links
* bpftrace reference guide <https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md>
* bpftrace (DTrace 2.0) for Linux 2018 <http://www.brendangregg.com/blog/2018-10-08/dtrace-for-linux-2018.html>
* BPF: the universal in-kernel virtual machine <https://lwn.net/Articles/599755/>
* Linux Extended BPF (eBPF) Tracing Tools <http://www.brendangregg.com/ebpf.html>
* Dive into BPF: a list of reading material [https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into-bpf][6]
* * *
_Photo by _[_Roman Romashov_][7]_ on _[_Unsplash_][8]_._
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/trace-code-in-fedora-with-bpftrace/
作者:[Augusto Caringi][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://fedoramagazine.org/author/acaringi/
[b]: https://github.com/lujun9972
[1]: https://fedoramagazine.org/wp-content/uploads/2019/08/bpftrace-816x345.jpg
[2]: https://github.com/iovisor/bpftrace
[3]: https://lwn.net/Articles/740157/
[4]: https://github.com/iovisor/bpf-docs/blob/master/eBPF.md
[5]: https://fedoramagazine.org/howto-use-sudo/
[6]: https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into-bpf/
[7]: https://unsplash.com/@wehavemegapixels?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText
[8]: https://unsplash.com/search/photos/trace?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText

View File

@ -0,0 +1,210 @@
[#]: collector: (lujun9972)
[#]: translator: (YungeG)
[#]: reviewer: ( )
[#]: publisher: ( )
[#]: url: ( )
[#]: subject: (Trace code in Fedora with bpftrace)
[#]: via: (https://fedoramagazine.org/trace-code-in-fedora-with-bpftrace/)
[#]: author: (Augusto Caringi https://fedoramagazine.org/author/acaringi/)
在 Fedora 中用 bpftrace 追踪代码
======
![][1]
bpftrace 是一个[基于 eBPF 的新型追踪工具][2],在 Fedora 28 第一次引入。Brendan GreggAlastair Robertson 和 Matheus Marchini 在分散于全网络的黑客团队的帮助下开发了 bpftrace一个允许你分析系统在幕后正在执行的操作的追踪工具告诉你代码中正在被调用的函数、传递给函数的参数、函数的调用次数等。
这篇文章的内容涉及了 bpftrace 的一些基础,以及它是如何工作的,请继续阅读获取更多的信息和一些有用的实例。
### eBPF (extended Berkeley Packet Filter)
[eBPF][3] 是一个微型虚拟机,更确切的说是一个虚拟 CPU位于 Linux 内核中。eBPF 可以在内核空间以一种安全可控的方式加载和运行体积较小的程序,保证 eBPF 的使用更加安全即使在生产环境系统中。eBPF 虚拟机有自己的指令集([ISA][4]),类似于现代处理器架构的一个子集。通过这个 ISA可以很容易将 eBPF 程序转化为真实硬件上的代码。内核即时将程序转化为主流处理器架构上的本地代码,从而提升性能。
eBPF 虚拟机允许通过编程扩展内核,目前已经有一些内核子系统使用这一新型强大的 Linux Kernel 功能,比如网络,安全计算、追踪等。这些子系统的主要思想是添加 eBPF 程序到特定的代码点,从而扩展原生的内核行为。
虽然 eBPF 机器语言功能强大由于是一种底层语言直接用于编写代码很费力bpftrace 就是为了解决这个问题而生的。eBPF 提供了一种高级语言编写 eBPF 追踪脚本,然后在 clang / LLVM 库的帮助下将这些脚本转化为 eBPF最终添加到特定的代码点。
## 安装和快速入门
在终端 [使用][5] _[sudo][5]_ 执行下面的命令安装 bpftrace
```
$ sudo dnf install bpftrace
```
使用“hello world”进行实验
```
$ sudo bpftrace -e 'BEGIN { printf("hello world\n"); }'
```
注意,出于特权级的需要,你必须使用 _root_ 运行 _bpftrace_,使用 _-e_ 选项指明一个程序,构建一个所谓的“单行程序”。这个例子只会打印 _hello world_,接着等待你按下 **Ctrl+C**
_BEGIN_ 是一个特殊的探针名,只在执行一开始生效一次;每次探针命中时,大括号 _{}_ 内的操作——这个例子中只是一个 _printf_——都会执行。
现在让我们转向一个更有用的例子:
```
$ sudo bpftrace -e 't:syscalls:sys_enter_execve { printf("%s called %s\n", comm, str(args->filename)); }'
```
这个例子打印系统中正在创建的每个新进程的父进程名 _(comm)_。_t:syscalls:sys_enter_execve_ 是一个内核追踪点,是 _tracepoint:syscalls:sys_enter_execve_ 的简写,两种形式都可以使用。下一部分会向你展示如何列出所有可用的追踪点。
_comm_ 是一个 bpftrace 内建指令代表进程名_filename_ 是 _t:syscalls:sys_enter_execve_ 追踪点的一个域,这些域可以通过 _args_ 内建指令访问。
追踪点的所有可用域可以通过这个命令列出:
```
bpftrace -lv "t:syscalls:sys_enter_execve"
```
## 示例用法
### 列出探针
_bpftrace_ 的一个核心概念是 **探针点**,即 eBPF 程序可以连接到的(内核或用户空间)代码中的测量点,可以分成以下几大类:
* _kprobe_——内核函数的开始处
* _uprobe_——内核函数的返回处
* _uprobe_——用户级函数的开始处
* _uretprobe_——用户级函数的返回处
* _tracepoint_——内核静态追踪点
* _usdt_——用户级静态追踪点
* _profile_——基于时间的采样
* _interval_——基于时间的输出
* _software_——内核软件事件
* _hardware_——用户级事件
所有可用的 _kprobe / kretprobe_、_tracepoints_、_software_ 和 _hardware_ 探针可以通过这个命令列出:
```
$ sudo bpftrace -l
```
_uprobe / uretprobe_ 和 _usdt_ 是用户空间探针,专用于某个可执行文件。要使用这些探针,通过下文中的特殊语法。
_profile_ 和 _interval_ 探针以固定的时间间隔触发;固定的时间间隔不在本文的范畴内。
### 统计系统调用数
### Counting system calls
**Maps** 是保存计数、统计数据和柱状图的特殊 BPF 数据类型,你可以使用映射统计每个系统调用正在被调用的次数:
```
$ sudo bpftrace -e 't:syscalls:sys_enter_* { @[probe] = count(); }'
```
一些探针类型允许使用通配符匹配多个探针,你也可以使用一个逗号隔开的列表为一个操作块指明多个连接点。上面的例子中,操作块连接到了所有名称以 _t:syscalls:sysenter__ 开头的追踪点,即所有可用的系统调用。
bpftrace 的内建函数 _count()_ 统计系统调用被调用的次数_@[]_ 代表一个映射(一个关联的数组)。映射的键是另一个内建指令 _probe_,代表完整的探针名。
这个例子中,相同的操作块连接到了每个系统调用,之后每次有系统调用被调用时,映射就会被更新,映射中和系统调用对应的项就会增加。程序终止时,自动打印出所有声明的映射。
下面的例子统计所有的系统调用,然后通过 bpftrace 过滤语法使用 _PID_ 过滤出某个特定进程调用的系统调用:
```
$ sudo bpftrace -e 't:syscalls:sys_enter_* / pid == 1234 / { @[probe] = count(); }'
```
### 进程写的字节数
让我们使用上面的概念分析每个进程正在写的字节数:
```
$ sudo bpftrace -e 't:syscalls:sys_exit_write /args->ret > 0/ { @[comm] = sum(args->ret); }'
```
_bpftrace_ 连接操作块到写系统调用的返回探针_t:syscalls:sys_exit_write_然后使用过滤器丢掉代表错误代码的负值_/arg->ret > 0/_
映射的键 _comm_ 代表调用系统调用的进程名;内建函数 _sum()_ 累计每个映射项或进程写的字节数_args_ 是一个 `bpftrace` 内建指令用于访问追踪点的参数和返回值。如果执行成功_write_ 系统调用返回写的字节数_arg->ret_ 用于访问这个字节数。
### 进程的读取大小分布(柱状图):
_bpftrace_ 支持创建柱状图。让我们分析一个创建进程的 _read_ 大小分布的柱状图的例子:
```
$ sudo bpftrace -e 't:syscalls:sys_exit_read { @[comm] = hist(args->ret); }'
```
柱状图是 BPF 映射因此必须保存为一个映射_@_这个例子中映射键是 _comm_
这个例子使 _bpftrace_ 给每个调用 _read_ 系统调用的进程生成一个柱状图。要生成一个全局柱状图,直接保存 _hist()_ 函数到 _'@'_(不使用任何键)。
程序终止时bpftrace 自动打印出声明的柱状图。创建柱状图的基准值是通过 _args->ret_ 获取到的读取的字节数。
### 追踪用户空间程序
你也可以通过 _uprobes / uretprobes__USDT_(用户级静态定义的追踪)追踪用户空间程序。下一个例子使用探测用户级函数结尾处的 _uretprobe_ ,获取系统中运行的每个 _bash_ 发出的命令行:
```
$ sudo bpftrace -e 'uretprobe:/bin/bash:readline { printf("readline: \"%s\"\n", str(retval)); }'
```
要列出可执行文件 _bash_ 的所有可用 _uprobes / uretprobes_ 执行这个命令:
```
$ sudo bpftrace -l "uprobe:/bin/bash"
```
_uprobe_ 指向用户级函数执行的开始_uretprobe_ 指向执行的结束返回处_readline()_ 是 _/bin/bash_ 的一个函数返回键入的命令行_retval_ 是被探测的指令的返回值,只能在 _uretprobe_ 访问。
使用 _uprobes_ 时,你可以用 _arg0..argN_ 访问参数。需要调用 _str()__char *_ 指针转化成一个 _string_
## 自带脚本
bpftrace 软件包附带了许多有用的脚本,可以在 _/usr/share/bpftrace/tools/_ 目录找到。
这些脚本中,你可以找到:
* _killsnoop.bt_——追踪 `kill()` 系统调用发出的信号
* _tcpconnect.bt_——追踪所有的 TCP 网络连接
* _pidpersec.bt_——统计每秒钟(通过 fork创建的新进程
* _opensnoop.bt_——追踪 `open()` 系统调用
* _bfsstat.bt_——追踪一些 VFS 调用,按秒统计
你可以直接使用这些脚本,比如:
```
$ sudo /usr/share/bpftrace/tools/killsnoop.bt
```
你也可以在创建新的工具时参考这些脚本。
## 链接
* bpftrace 参考指南——<https://github.com/iovisor/bpftrace/blob/master/docs/reference_guide.md>
* Linux 2018 `bpftrace`DTrace 2.0)——<http://www.brendangregg.com/blog/2018-10-08/dtrace-for-linux-2018.html>
* BPF通用的内核虚拟机——<https://lwn.net/Articles/599755/>
* Linux Extended BPFeBPFTracing Tools——<http://www.brendangregg.com/ebpf.html>
* 深入 BPF一个阅读材料列表—— [https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into-bpf][6]
* * *
_Photo by _[_Roman Romashov_][7]_ on _[_Unsplash_][8]_._
--------------------------------------------------------------------------------
via: https://fedoramagazine.org/trace-code-in-fedora-with-bpftrace/
作者:[Augusto Caringi][a]
选题:[lujun9972][b]
译者:[YungeG](https://github.com/YungeG)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://fedoramagazine.org/author/acaringi/
[b]: https://github.com/lujun9972
[1]: https://fedoramagazine.org/wp-content/uploads/2019/08/bpftrace-816x345.jpg
[2]: https://github.com/iovisor/bpftrace
[3]: https://lwn.net/Articles/740157/
[4]: https://github.com/iovisor/bpf-docs/blob/master/eBPF.md
[5]: https://fedoramagazine.org/howto-use-sudo/
[6]: https://qmonnet.github.io/whirl-offload/2016/09/01/dive-into-bpf/
[7]: https://unsplash.com/@wehavemegapixels?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText
[8]: https://unsplash.com/search/photos/trace?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText