mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-26 21:30:55 +08:00
Finish translation
File name: 20210727 Analyze the Linux kernel with ftrace.md
This commit is contained in:
parent
5cffaac4bf
commit
079dce8ec6
@ -1,404 +0,0 @@
|
|||||||
[#]: subject: (Analyze the Linux kernel with ftrace)
|
|
||||||
[#]: via: (https://opensource.com/article/21/7/linux-kernel-ftrace)
|
|
||||||
[#]: author: (Gaurav Kamathe https://opensource.com/users/gkamathe)
|
|
||||||
[#]: collector: (lujun9972)
|
|
||||||
[#]: translator: (mengxinayan)
|
|
||||||
[#]: reviewer: ( )
|
|
||||||
[#]: publisher: ( )
|
|
||||||
[#]: url: ( )
|
|
||||||
|
|
||||||
Analyze the Linux kernel with ftrace
|
|
||||||
======
|
|
||||||
Ftrace is a great way to learn more about the internal workings of the
|
|
||||||
Linux kernel.
|
|
||||||
![Linux keys on the keyboard for a desktop computer][1]
|
|
||||||
|
|
||||||
An operating system's kernel is one of the most elusive pieces of software out there. It's always there running in the background from the time your system gets turned on. Every user achieves their computing work with the help of the kernel, yet they never interact with it directly. The interaction with the kernel occurs by making system calls or having those calls made on behalf of the user by various libraries or applications that they use daily.
|
|
||||||
|
|
||||||
I've covered how to trace system calls in an earlier article using `strace`. However, with `strace`, your visibility is limited. It allows you to view the system calls invoked with specific parameters and, after the work gets done, see the return value or status indicating whether they passed or failed. But you had no idea what happened inside the kernel during this time. Besides just serving system calls, there's a lot of other activity happening inside the kernel that you're oblivious to.
|
|
||||||
|
|
||||||
### Ftrace Introduction
|
|
||||||
|
|
||||||
This article aims to shed some light on tracing the kernel functions by using a mechanism called `ftrace`. It makes kernel tracing easily accessible to any Linux user, and with its help you can learn a lot about Linux kernel internals.
|
|
||||||
|
|
||||||
The default output generated by the `ftrace` is often massive, given that the kernel is always busy. To save space, I've kept the output to a minimum and, in many cases truncated the output entirely.
|
|
||||||
|
|
||||||
I am using Fedora for these examples, but they should work on any of the latest Linux distributions.
|
|
||||||
|
|
||||||
### Enabling ftrace
|
|
||||||
|
|
||||||
`Ftrace` is part of the Linux kernel now, and you no longer need to install anything to use it. It is likely that, if you are using a recent Linux OS, `ftrace` is already enabled. To verify that the `ftrace` facility is available, run the mount command and search for `tracefs`. If you see output similar to what is below, `ftrace` is enabled, and you can easily follow the examples in this article:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo mount | grep tracefs
|
|
||||||
none on /sys/kernel/tracing type tracefs (rw,relatime,seclabel)
|
|
||||||
```
|
|
||||||
|
|
||||||
To make use of `ftrace`, you first must navigate to the special directory as specified in the mount command above, from where you'll run the rest of the commands in the article:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
`$ cd /sys/kernel/tracing`
|
|
||||||
```
|
|
||||||
|
|
||||||
### General work flow
|
|
||||||
|
|
||||||
First of all, you must understand the general workflow of capturing a trace and obtaining the output. If you're using `ftrace` directly, there isn't any special `ftrace-`specific commands to run. Instead, you basically write to some files and read from some files using standard command-line Linux utilities.
|
|
||||||
|
|
||||||
The general steps:
|
|
||||||
|
|
||||||
1. Write to some specific files to enable/disable tracing.
|
|
||||||
2. Write to some specific files to set/unset filters to fine-tune tracing.
|
|
||||||
3. Read generated trace output from files based on 1 and 2.
|
|
||||||
4. Clear earlier output or buffer from files.
|
|
||||||
5. Narrow down to your specific use case (kernel functions to trace) and repeat steps 1, 2, 3, 4.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Types of available tracers
|
|
||||||
|
|
||||||
There are several different kinds of tracers available to you. As mentioned earlier, you need to be in a specific directory before running any of these commands because the files of interest are present there. I use relative paths (as opposed to absolute paths) in my examples.
|
|
||||||
|
|
||||||
You can view the contents of the `available_tracers` file to see all the types of tracers available. You can see a few listed below. Don't worry about all of them just yet:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ pwd
|
|
||||||
/sys/kernel/tracing
|
|
||||||
|
|
||||||
$ sudo cat available_tracers
|
|
||||||
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
|
|
||||||
```
|
|
||||||
|
|
||||||
Out of all the given tracers, I focus on three specific ones: `function` and `function_graph` to enable tracing, and `nop` to disable tracing.
|
|
||||||
|
|
||||||
### Identify current tracer
|
|
||||||
|
|
||||||
Usually, by default, the tracer is set to `nop`. That is, "No operation" in the special file `current_tracer`, which usually means tracing is currently off:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ pwd
|
|
||||||
/sys/kernel/tracing
|
|
||||||
|
|
||||||
$ sudo cat current_tracer
|
|
||||||
nop
|
|
||||||
```
|
|
||||||
|
|
||||||
### View Tracing output
|
|
||||||
|
|
||||||
Before you enable any tracing, take a look at the file where the tracing output gets stored. You can view the contents of the file named `trace` using the [cat][2] command:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat trace
|
|
||||||
# tracer: nop
|
|
||||||
#
|
|
||||||
# entries-in-buffer/entries-written: 0/0 #P:8
|
|
||||||
#
|
|
||||||
# _-----=> irqs-off
|
|
||||||
# / _----=> need-resched
|
|
||||||
# | / _---=> hardirq/softirq
|
|
||||||
# || / _--=> preempt-depth
|
|
||||||
# ||| / delay
|
|
||||||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
|
||||||
# | | | |||| | |
|
|
||||||
```
|
|
||||||
|
|
||||||
### Enable function tracer
|
|
||||||
|
|
||||||
You can enable your first tracer called `function` by writing `function` to the file `current_tracer` (its earlier content was `nop`, indicating that tracing was off.) Think of this operation as a way of enabling tracing:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ pwd
|
|
||||||
/sys/kernel/tracing
|
|
||||||
|
|
||||||
$ sudo cat current_tracer
|
|
||||||
nop
|
|
||||||
$ echo function > current_tracer
|
|
||||||
$
|
|
||||||
$ cat current_tracer
|
|
||||||
function
|
|
||||||
```
|
|
||||||
|
|
||||||
### View updated tracing output for function tracer
|
|
||||||
|
|
||||||
Now that you've enabled tracing, it's time to view the output. If you view the contents of the `trace` file, you see a lot of data being written to it continuously. I've piped the output and am currently viewing only the top 20 lines to keep things manageable. If you follow the headers in the output on the left, you can see which task and Process ID are running on which CPU. Toward the right side of the output, you see the exact kernel function running, followed by its parent function. There is also time stamp information in the center:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat trace | head -20
|
|
||||||
# tracer: function
|
|
||||||
#
|
|
||||||
# entries-in-buffer/entries-written: 409936/4276216 #P:8
|
|
||||||
#
|
|
||||||
# _-----=> irqs-off
|
|
||||||
# / _----=> need-resched
|
|
||||||
# | / _---=> hardirq/softirq
|
|
||||||
# || / _--=> preempt-depth
|
|
||||||
# ||| / delay
|
|
||||||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
|
||||||
# | | | |||| | |
|
|
||||||
<idle>-0 [000] d... 2088.841739: tsc_verify_tsc_adjust <-arch_cpu_idle_enter
|
|
||||||
<idle>-0 [000] d... 2088.841739: local_touch_nmi <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841740: rcu_nocb_flush_deferred_wakeup <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841740: tick_check_broadcast_expired <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841740: cpuidle_get_cpu_driver <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841740: cpuidle_not_available <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841741: cpuidle_select <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841741: menu_select <-do_idle
|
|
||||||
<idle>-0 [000] d... 2088.841741: cpuidle_governor_latency_req <-menu_select
|
|
||||||
```
|
|
||||||
|
|
||||||
Remember that tracing is on, which means the output of tracing continues to get written to the trace file until you turn tracing off.
|
|
||||||
|
|
||||||
### Turn off tracing
|
|
||||||
|
|
||||||
Turning off tracing is simple. All you have to do is replace `function` tracer with `nop` in the `current_tracer` file and tracing gets turned off:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat current_tracer
|
|
||||||
function
|
|
||||||
|
|
||||||
$ sudo echo nop > current_tracer
|
|
||||||
|
|
||||||
$ sudo cat current_tracer
|
|
||||||
nop
|
|
||||||
```
|
|
||||||
|
|
||||||
### Enable function_graph tracer
|
|
||||||
|
|
||||||
Now try the second tracer, called `function_graph`. You can enable this using the same steps as before: write `function_graph` to the `current_tracer` file:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo echo function_graph > current_tracer
|
|
||||||
|
|
||||||
$ sudo cat current_tracer
|
|
||||||
function_graph
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tracing output of function_graph tracer
|
|
||||||
|
|
||||||
Notice that the output format of the `trace` file has changed. Now, you can see the CPU ID and the duration of the kernel function execution. Next, you see curly braces indicating the beginning of a function and what other functions were called from inside it:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat trace | head -20
|
|
||||||
# tracer: function_graph
|
|
||||||
#
|
|
||||||
# CPU DURATION FUNCTION CALLS
|
|
||||||
# | | | | | | |
|
|
||||||
6) | n_tty_write() {
|
|
||||||
6) | down_read() {
|
|
||||||
6) | __cond_resched() {
|
|
||||||
6) 0.341 us | rcu_all_qs();
|
|
||||||
6) 1.057 us | }
|
|
||||||
6) 1.807 us | }
|
|
||||||
6) 0.402 us | process_echoes();
|
|
||||||
6) | add_wait_queue() {
|
|
||||||
6) 0.391 us | _raw_spin_lock_irqsave();
|
|
||||||
6) 0.359 us | _raw_spin_unlock_irqrestore();
|
|
||||||
6) 1.757 us | }
|
|
||||||
6) 0.350 us | tty_hung_up_p();
|
|
||||||
6) | mutex_lock() {
|
|
||||||
6) | __cond_resched() {
|
|
||||||
6) 0.404 us | rcu_all_qs();
|
|
||||||
6) 1.067 us | }
|
|
||||||
```
|
|
||||||
|
|
||||||
### Enable trace settings to increase the depth of tracing
|
|
||||||
|
|
||||||
You can always tweak the tracer slightly to see more depth of the function calls using the steps below. After which, you can view the contents of the `trace` file and see that the output is slightly more detailed. For readability, the output of this example is omitted:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat max_graph_depth
|
|
||||||
0
|
|
||||||
$ sudo echo 1 > max_graph_depth
|
|
||||||
$ # or
|
|
||||||
$ sudo echo 2 > max_graph_depth
|
|
||||||
|
|
||||||
$ sudo cat trace
|
|
||||||
```
|
|
||||||
|
|
||||||
### Finding functions to trace
|
|
||||||
|
|
||||||
The steps above are sufficient to get started with tracing. However, the amount of output generated is enormous, and you can often get lost while trying to find out items of interest. Often you want the ability to trace specific functions only and ignore the rest. But how do you know which processes to trace if you don't know their exact names? There is a file that can help you with this—`available_filter_functions` provides you with a list of available functions for tracing:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo wc -l available_filter_functions
|
|
||||||
63165 available_filter_functions
|
|
||||||
```
|
|
||||||
|
|
||||||
### Search for general kernel functions
|
|
||||||
|
|
||||||
Now try searching for a simple kernel function that you are aware of. User-space has `malloc` to allocate memory, while the kernel has its `kmalloc` function, which provides similar functionality. Below are all the `kmalloc` related functions:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo grep kmalloc available_filter_functions
|
|
||||||
debug_kmalloc
|
|
||||||
mempool_kmalloc
|
|
||||||
kmalloc_slab
|
|
||||||
kmalloc_order
|
|
||||||
kmalloc_order_trace
|
|
||||||
kmalloc_fix_flags
|
|
||||||
kmalloc_large_node
|
|
||||||
__kmalloc
|
|
||||||
__kmalloc_track_caller
|
|
||||||
__kmalloc_node
|
|
||||||
__kmalloc_node_track_caller
|
|
||||||
[...]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Search for kernel module or driver related functions
|
|
||||||
|
|
||||||
From the output of `available_filter_functions`, you can see some lines ending with text in brackets, such as `[kvm_intel]` in the example below. These functions are related to the kernel module `kvm_intel`, which is currently loaded. You can run the `lsmod` command to verify:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo grep kvm available_filter_functions | tail
|
|
||||||
__pi_post_block [kvm_intel]
|
|
||||||
vmx_vcpu_pi_load [kvm_intel]
|
|
||||||
vmx_vcpu_pi_put [kvm_intel]
|
|
||||||
pi_pre_block [kvm_intel]
|
|
||||||
pi_post_block [kvm_intel]
|
|
||||||
pi_wakeup_handler [kvm_intel]
|
|
||||||
pi_has_pending_interrupt [kvm_intel]
|
|
||||||
pi_update_irte [kvm_intel]
|
|
||||||
vmx_dump_dtsel [kvm_intel]
|
|
||||||
vmx_dump_sel [kvm_intel]
|
|
||||||
|
|
||||||
$ lsmod | grep -i kvm
|
|
||||||
kvm_intel 335872 0
|
|
||||||
kvm 987136 1 kvm_intel
|
|
||||||
irqbypass 16384 1 kvm
|
|
||||||
```
|
|
||||||
|
|
||||||
### Trace specific functions only
|
|
||||||
|
|
||||||
To enable tracing of specific functions or patterns, you can make use of the `set_ftrace_filter` file to specify which functions from the above output you want to trace.
|
|
||||||
This file also accepts the `*` pattern, which expands to include additional functions with the given pattern. As an example, I am using the `ext4` filesystem on my machine. I can specify `ext4` specific kernel functions to trace using the following commands:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo mount | grep home
|
|
||||||
/dev/mapper/fedora-home on /home type ext4 (rw,relatime,seclabel)
|
|
||||||
|
|
||||||
$ pwd
|
|
||||||
/sys/kernel/tracing
|
|
||||||
|
|
||||||
$ sudo cat set_ftrace_filter
|
|
||||||
#### all functions enabled ####
|
|
||||||
$
|
|
||||||
$ echo ext4_* > set_ftrace_filter
|
|
||||||
$
|
|
||||||
$ cat set_ftrace_filter
|
|
||||||
ext4_has_free_clusters
|
|
||||||
ext4_validate_block_bitmap
|
|
||||||
ext4_get_group_number
|
|
||||||
ext4_get_group_no_and_offset
|
|
||||||
ext4_get_group_desc
|
|
||||||
[...]
|
|
||||||
```
|
|
||||||
|
|
||||||
Now, when you see the tracing output, you can only see functions `ext4` related to kernel functions for which you had set a filter earlier. All the other output gets ignored:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat trace |head -20
|
|
||||||
# tracer: function
|
|
||||||
#
|
|
||||||
# entries-in-buffer/entries-written: 3871/3871 #P:8
|
|
||||||
#
|
|
||||||
# _-----=> irqs-off
|
|
||||||
# / _----=> need-resched
|
|
||||||
# | / _---=> hardirq/softirq
|
|
||||||
# || / _--=> preempt-depth
|
|
||||||
# ||| / delay
|
|
||||||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
|
||||||
# | | | |||| | |
|
|
||||||
cupsd-1066 [004] .... 3308.989545: ext4_file_getattr <-vfs_fstat
|
|
||||||
cupsd-1066 [004] .... 3308.989547: ext4_getattr <-ext4_file_getattr
|
|
||||||
cupsd-1066 [004] .... 3308.989552: ext4_file_getattr <-vfs_fstat
|
|
||||||
cupsd-1066 [004] .... 3308.989553: ext4_getattr <-ext4_file_getattr
|
|
||||||
cupsd-1066 [004] .... 3308.990097: ext4_file_open <-do_dentry_open
|
|
||||||
cupsd-1066 [004] .... 3308.990111: ext4_file_getattr <-vfs_fstat
|
|
||||||
cupsd-1066 [004] .... 3308.990111: ext4_getattr <-ext4_file_getattr
|
|
||||||
cupsd-1066 [004] .... 3308.990122: ext4_llseek <-ksys_lseek
|
|
||||||
cupsd-1066 [004] .... 3308.990130: ext4_file_read_iter <-new_sync_read
|
|
||||||
```
|
|
||||||
|
|
||||||
### Exclude functions from being traced
|
|
||||||
|
|
||||||
You don't always know what you want to trace but, you surely know what you don't want to trace. For that, there is this file aptly named `set_ftrace_notrace`—notice the "no" in there. You can write your desired pattern in this file and enable tracing, upon which everything except the mentioned pattern gets traced. This is often helpful to remove common functionality that clutters our output:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat set_ftrace_notrace
|
|
||||||
#### no functions disabled ####
|
|
||||||
```
|
|
||||||
|
|
||||||
### Targetted tracing
|
|
||||||
|
|
||||||
So far, you've been tracing everything that has happened in the kernel. But that won't help us if you wish to trace events related to a specific command. To achieve this, you can turn tracing on and off on-demand and, and in between them, run our command of choice so that you do not get extra output in your trace output. You can enable tracing by writing `1` to `tracing_on`, and `0` to turn it off:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo cat tracing_on
|
|
||||||
0
|
|
||||||
|
|
||||||
$ sudo echo 1 > tracing_on
|
|
||||||
$ sudo cat tracing_on
|
|
||||||
1
|
|
||||||
|
|
||||||
$ # Run some specific command that we wish to trace here
|
|
||||||
|
|
||||||
$ sudo echo 0 > tracing_on
|
|
||||||
|
|
||||||
$ cat tracing_on
|
|
||||||
0
|
|
||||||
```
|
|
||||||
|
|
||||||
### Tracing specific PID
|
|
||||||
|
|
||||||
If you want to trace activity related to a specific process that is already running, you can write that PID to a file named `set_ftrace_pid` and then enable tracing. That way, tracing is limited to this PID only, which is very helpful in some instances:
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
`$ sudo echo $PID > set_ftrace_pid`
|
|
||||||
```
|
|
||||||
|
|
||||||
### Conclusion
|
|
||||||
|
|
||||||
`Ftrace` is a great way to learn more about the internal workings of the Linux kernel. With some practice, you can learn to fine-tune `ftrace` and narrow down your searches. To understand `ftrace` in more detail and its advanced usage, see these excellent articles written by the core author of `ftrace` himself—Steven Rostedt.
|
|
||||||
|
|
||||||
* [Debugging the Linux kernel, part 1][3]
|
|
||||||
* [Debugging the Linux kernel, part 2][4]
|
|
||||||
* [Debugging the Linux kernel, part 3][5]
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
--------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
via: https://opensource.com/article/21/7/linux-kernel-ftrace
|
|
||||||
|
|
||||||
作者:[Gaurav Kamathe][a]
|
|
||||||
选题:[lujun9972][b]
|
|
||||||
译者:[萌新阿岩](https://github.com/mengxinayan)
|
|
||||||
校对:[校对者ID](https://github.com/校对者ID)
|
|
||||||
|
|
||||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
|
||||||
|
|
||||||
[a]: https://opensource.com/users/gkamathe
|
|
||||||
[b]: https://github.com/lujun9972
|
|
||||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/linux_keyboard_desktop.png?itok=I2nGw78_ (Linux keys on the keyboard for a desktop computer)
|
|
||||||
[2]: https://opensource.com/article/19/2/getting-started-cat-command
|
|
||||||
[3]: https://lwn.net/Articles/365835/
|
|
||||||
[4]: https://lwn.net/Articles/366796/
|
|
||||||
[5]: https://lwn.net/Articles/370423/
|
|
381
translated/tech/20210727 Analyze the Linux kernel with ftrace.md
Normal file
381
translated/tech/20210727 Analyze the Linux kernel with ftrace.md
Normal file
@ -0,0 +1,381 @@
|
|||||||
|
[#]: subject: (Analyze the Linux kernel with ftrace)
|
||||||
|
[#]: via: (https://opensource.com/article/21/7/linux-kernel-ftrace)
|
||||||
|
[#]: author: (Gaurav Kamathe https://opensource.com/users/gkamathe)
|
||||||
|
[#]: collector: (lujun9972)
|
||||||
|
[#]: translator: (mengxinayan)
|
||||||
|
[#]: reviewer: ( )
|
||||||
|
[#]: publisher: ( )
|
||||||
|
[#]: url: ( )
|
||||||
|
|
||||||
|
通过 `ftrace` 来分析 Linux 内核
|
||||||
|
======
|
||||||
|
通过 `ftrace` 来了解 Linux 内核内部工作方式是一个好方法。
|
||||||
|
![Linux keys on the keyboard for a desktop computer][1]
|
||||||
|
|
||||||
|
一个操作系统的内核是最难以理解的软件之一。自从你的系统启动后,它会一直在后台运行。尽管每个用户都不与内核直接交互,但他们在内核的帮助下完成自己的计算任务。与内核的交互发生在调用系统调用或者用户日常使用的各种库或应用间接调用了系统调用。
|
||||||
|
|
||||||
|
在之前的文章里我介绍了如何使用 `strace` 来追踪系统调用。然而,使用 `strace` 时你的可见性是受限的。它允许你查看特定参数的系统调用。并在工作完成后,看到其返回值或状态,来表明是通过还是失败。但是你无法知道内核在这段时间内发生了什么。除了系统调用外,内核中还有很多其他活动发生时却被忽略了。
|
||||||
|
|
||||||
|
### `ftrace` 介绍
|
||||||
|
|
||||||
|
本文的目的是通过使用一个名为 `ftrace` 的机制来追踪内核函数。任何 Linux 用户可以通过使用它来轻松地追踪内核,并且了解更多关于 Linux 内核内部如何工作。
|
||||||
|
|
||||||
|
`ftrace` 默认产生的输出是巨大的,因为内核总是忙的。为了节省空间,很多情况下我会通过截断来给出最小输出。
|
||||||
|
|
||||||
|
我使用 Fedora 来演示下面的例子,但是它们应该在其他最新的 Linux 发行版上同样可以运行。
|
||||||
|
|
||||||
|
### 启用 `ftrace`
|
||||||
|
|
||||||
|
`ftrace` 现在已经是内核中的一部分了,你不再需要事先安装它了。也就是说,如果你在使用最近的 Linux 系统,那么 `ftrace` 是已经启动了的。为了验证 `ftrace` 是否可用,运行 `mount` 命令并查找 `tracefs`。如果你看到类似下面的输出,表示 `ftrace` 已经启用,你可以轻松地尝试本文中下面的例子。下面的命令需要在 root 用户下使用(`sudo` 是不够的)
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo mount | grep tracefs
|
||||||
|
none on /sys/kernel/tracing type tracefs (rw,relatime,seclabel)
|
||||||
|
```
|
||||||
|
|
||||||
|
要想使用 `ftrace`,你首先需要进入上面 `mount` 命令中找到的特定目录中,在那个目录下运行文章中的其他命令。
|
||||||
|
|
||||||
|
```
|
||||||
|
`$ cd /sys/kernel/tracing`
|
||||||
|
```
|
||||||
|
|
||||||
|
### 一般的工作流程
|
||||||
|
|
||||||
|
首先,你需要理解捕捉踪迹和获取输出的一般流程。如果你直接运行 `ftrace`,没有任何特定的 `ftrace-` 命令会被运行。相反的,你基本上是通过标准 Linux 命令来写入或读取一些文件。
|
||||||
|
|
||||||
|
通用的步骤如下:
|
||||||
|
1. 通过写入一些特定文件来启用/结束追踪
|
||||||
|
2. 通过写入一些特定文件来设置/取消追踪时的过滤规则
|
||||||
|
3. 读取基于第 1 和 2 步的追踪输出
|
||||||
|
4. 清除输出的文件或缓存
|
||||||
|
5. 缩小到特定用例(你要追踪的内核函数),重复1,2,3,4 步
|
||||||
|
|
||||||
|
First of all, you must understand the general workflow of capturing a trace and obtaining the output. If you're using `ftrace` directly, there isn't any special `ftrace-`specific commands to run. Instead, you basically write to some files and read from some files using standard command-line Linux utilities.
|
||||||
|
|
||||||
|
### 可用的追踪器类型
|
||||||
|
|
||||||
|
有多种不同的追踪器可供你使用。之前提到,在运行任何命令前,你需要进入一个特定的目录下,因为文件在这些目录下。我在我的例子中使用相对路径(与绝对路径相反)
|
||||||
|
|
||||||
|
你可以查看 `available_tracers` 文件内容来查看所有可用的追踪器类型。你可以可以看下面列出了几个。不需要担心有这么多。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ pwd
|
||||||
|
/sys/kernel/tracing
|
||||||
|
|
||||||
|
$ sudo cat available_tracers
|
||||||
|
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
|
||||||
|
```
|
||||||
|
|
||||||
|
在所有输出的追踪器中,我会聚焦于下面三个特殊的:启用追踪的 `function` 和 `function_graph`,以及停止追踪的 `nop`
|
||||||
|
|
||||||
|
### 确认当前的追踪器
|
||||||
|
|
||||||
|
通常情况默认的追踪器设定为 `nop`。即在特殊文件中 `current_tracer` 中的 “无操作”,这意味着追踪目前是关闭的。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ pwd
|
||||||
|
/sys/kernel/tracing
|
||||||
|
|
||||||
|
$ sudo cat current_tracer
|
||||||
|
nop
|
||||||
|
```
|
||||||
|
|
||||||
|
### 查看追踪输出
|
||||||
|
|
||||||
|
在启用任何追踪功能之前,请你看一下保存追踪输出的文件。你可以用 [cat](2) 命令查看名为 `trace` 的文件的内容。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat trace
|
||||||
|
# tracer: nop
|
||||||
|
#
|
||||||
|
# entries-in-buffer/entries-written: 0/0 #P:8
|
||||||
|
#
|
||||||
|
# _-----=> irqs-off
|
||||||
|
# / _----=> need-resched
|
||||||
|
# | / _---=> hardirq/softirq
|
||||||
|
# || / _--=> preempt-depth
|
||||||
|
# ||| / delay
|
||||||
|
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||||
|
# | | | |||| | |
|
||||||
|
```
|
||||||
|
|
||||||
|
### 启用 `function` 追踪器
|
||||||
|
|
||||||
|
你可以通过向 `current_tracer` 文件写入 `function` 来启用第一个追踪器 `function`(文件原本内容为 `nop`,意味着追踪是关闭的)。把这个操作看成是启用追踪的一种方式。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ pwd
|
||||||
|
/sys/kernel/tracing
|
||||||
|
|
||||||
|
$ sudo cat current_tracer
|
||||||
|
nop
|
||||||
|
$ echo function > current_tracer
|
||||||
|
$
|
||||||
|
$ cat current_tracer
|
||||||
|
function
|
||||||
|
```
|
||||||
|
|
||||||
|
### 查看 `function` 追踪器的更新追踪输出
|
||||||
|
|
||||||
|
现在你已启动追踪,是时候查看输出了。如果你查看 `trace` 文件内容,你将会看到许多被连续写入的内容。我通过管道只展示了文件内容的前 20 行。根据左边输出的标题,你可以看到在某个 CPU 上运行的任务和进程 ID。根据右边输出的内容,你可以看到具体的内核函数和其副函数。中间显示了时间信息。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat trace | head -20
|
||||||
|
# tracer: function
|
||||||
|
#
|
||||||
|
# entries-in-buffer/entries-written: 409936/4276216 #P:8
|
||||||
|
#
|
||||||
|
# _-----=> irqs-off
|
||||||
|
# / _----=> need-resched
|
||||||
|
# | / _---=> hardirq/softirq
|
||||||
|
# || / _--=> preempt-depth
|
||||||
|
# ||| / delay
|
||||||
|
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||||
|
# | | | |||| | |
|
||||||
|
<idle>-0 [000] d... 2088.841739: tsc_verify_tsc_adjust <-arch_cpu_idle_enter
|
||||||
|
<idle>-0 [000] d... 2088.841739: local_touch_nmi <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841740: rcu_nocb_flush_deferred_wakeup <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841740: tick_check_broadcast_expired <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841740: cpuidle_get_cpu_driver <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841740: cpuidle_not_available <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841741: cpuidle_select <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841741: menu_select <-do_idle
|
||||||
|
<idle>-0 [000] d... 2088.841741: cpuidle_governor_latency_req <-menu_select
|
||||||
|
```
|
||||||
|
|
||||||
|
请记住当追踪打开后,这意味着追踪结果会被一直连续写入直至你关闭追踪。
|
||||||
|
|
||||||
|
### 关闭追踪
|
||||||
|
|
||||||
|
关闭追踪是简单的。你只需要在 `current_tracer` 文件中用 `nop` 替换 `function` 追踪器即可:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat current_tracer
|
||||||
|
function
|
||||||
|
|
||||||
|
$ sudo echo nop > current_tracer
|
||||||
|
|
||||||
|
$ sudo cat current_tracer
|
||||||
|
nop
|
||||||
|
```
|
||||||
|
|
||||||
|
### 启用 `function_graph` 追踪器
|
||||||
|
|
||||||
|
现在尝试第二个名为 `function_graph` 的追踪器。你可以使用和上面相同的步骤:在 `current_tracer` 文件中写入 `function_graph`:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo echo function_graph > current_tracer
|
||||||
|
|
||||||
|
$ sudo cat current_tracer
|
||||||
|
function_graph
|
||||||
|
```
|
||||||
|
|
||||||
|
### `function_tracer` 追踪器的追踪输出
|
||||||
|
|
||||||
|
注意到目前 `trace` 文件的输出格式已经发生变化。现在,你可以看到 CPU ID 和内核函数的执行时间。接下来,一个花括号表示一个函数的开始,以及它内部调用了哪些其他函数。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat trace | head -20
|
||||||
|
# tracer: function_graph
|
||||||
|
#
|
||||||
|
# CPU DURATION FUNCTION CALLS
|
||||||
|
# | | | | | | |
|
||||||
|
6) | n_tty_write() {
|
||||||
|
6) | down_read() {
|
||||||
|
6) | __cond_resched() {
|
||||||
|
6) 0.341 us | rcu_all_qs();
|
||||||
|
6) 1.057 us | }
|
||||||
|
6) 1.807 us | }
|
||||||
|
6) 0.402 us | process_echoes();
|
||||||
|
6) | add_wait_queue() {
|
||||||
|
6) 0.391 us | _raw_spin_lock_irqsave();
|
||||||
|
6) 0.359 us | _raw_spin_unlock_irqrestore();
|
||||||
|
6) 1.757 us | }
|
||||||
|
6) 0.350 us | tty_hung_up_p();
|
||||||
|
6) | mutex_lock() {
|
||||||
|
6) | __cond_resched() {
|
||||||
|
6) 0.404 us | rcu_all_qs();
|
||||||
|
6) 1.067 us | }
|
||||||
|
```
|
||||||
|
|
||||||
|
### 启用追踪的设置来增加追踪的深度
|
||||||
|
|
||||||
|
你可以使用下面的步骤来调整追踪器以看到更深层次的函数调用。完成之后,你可以查看 `trace` 文件的内容并发现输出变得更加详细了。为了文章的可读性,这个例子的输出被省略了:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat max_graph_depth
|
||||||
|
0
|
||||||
|
$ sudo echo 1 > max_graph_depth
|
||||||
|
$ # or
|
||||||
|
$ sudo echo 2 > max_graph_depth
|
||||||
|
|
||||||
|
$ sudo cat trace
|
||||||
|
```
|
||||||
|
|
||||||
|
### 查找要追踪的函数
|
||||||
|
|
||||||
|
上面的步骤足以让你开始追踪。但是它产生的输出内容是巨大的,当你想试图找到自己感兴趣的内容时,往往会很困难。通常你更希望能够之追踪特定的函数,而忽略其他函数。但如果你不知道它们确切的名称,你怎么知道要追踪哪些进程?有一个文件可以帮助你解决这个问题 —— `available_filter_functions` 文件提供了一个可供追踪的函数列表。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo wc -l available_filter_functions
|
||||||
|
63165 available_filter_functions
|
||||||
|
```
|
||||||
|
|
||||||
|
### 查找一般的内核函数
|
||||||
|
|
||||||
|
现在试着搜索一个你所知道的简单内核函数。用户空间有 `malloc` 函数用来分配内存,而内核有 `kmalloc` 函数,它提供类似的功能。下面是所有与 `kmalloc` 相关的函数。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo grep kmalloc available_filter_functions
|
||||||
|
debug_kmalloc
|
||||||
|
mempool_kmalloc
|
||||||
|
kmalloc_slab
|
||||||
|
kmalloc_order
|
||||||
|
kmalloc_order_trace
|
||||||
|
kmalloc_fix_flags
|
||||||
|
kmalloc_large_node
|
||||||
|
__kmalloc
|
||||||
|
__kmalloc_track_caller
|
||||||
|
__kmalloc_node
|
||||||
|
__kmalloc_node_track_caller
|
||||||
|
[...]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 查找内核模块或者驱动相关函数
|
||||||
|
|
||||||
|
在 `available_filter_functions` 文件的输出中,你可以看到一些以括号内文字结尾的行,例如下面的例子中的 `[kvm_intel]`。这些函数与当前加载的内核模块 `kvm_intel` 有关。你可以运行 `lsmod` 命令来验证。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo grep kvm available_filter_functions | tail
|
||||||
|
__pi_post_block [kvm_intel]
|
||||||
|
vmx_vcpu_pi_load [kvm_intel]
|
||||||
|
vmx_vcpu_pi_put [kvm_intel]
|
||||||
|
pi_pre_block [kvm_intel]
|
||||||
|
pi_post_block [kvm_intel]
|
||||||
|
pi_wakeup_handler [kvm_intel]
|
||||||
|
pi_has_pending_interrupt [kvm_intel]
|
||||||
|
pi_update_irte [kvm_intel]
|
||||||
|
vmx_dump_dtsel [kvm_intel]
|
||||||
|
vmx_dump_sel [kvm_intel]
|
||||||
|
|
||||||
|
$ lsmod | grep -i kvm
|
||||||
|
kvm_intel 335872 0
|
||||||
|
kvm 987136 1 kvm_intel
|
||||||
|
irqbypass 16384 1 kvm
|
||||||
|
```
|
||||||
|
|
||||||
|
### 仅追踪特定的函数
|
||||||
|
|
||||||
|
为了实现对特定函数或模式的追踪,你可以利用 `set_ftrace_filter` 文件来指定你要追踪上述输出中的哪些函数。这个文件也接受 `*` 模式,它可以扩展到包括具有给定模式的其他函数。作为一个例子,我在我的机器上使用ext4文件系统。我可以用下面的命令指定ext4的特定内核函数来追踪。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo mount | grep home
|
||||||
|
/dev/mapper/fedora-home on /home type ext4 (rw,relatime,seclabel)
|
||||||
|
|
||||||
|
$ pwd
|
||||||
|
/sys/kernel/tracing
|
||||||
|
|
||||||
|
$ sudo cat set_ftrace_filter
|
||||||
|
#### all functions enabled ####
|
||||||
|
$
|
||||||
|
$ echo ext4_* > set_ftrace_filter
|
||||||
|
$
|
||||||
|
$ cat set_ftrace_filter
|
||||||
|
ext4_has_free_clusters
|
||||||
|
ext4_validate_block_bitmap
|
||||||
|
ext4_get_group_number
|
||||||
|
ext4_get_group_no_and_offset
|
||||||
|
ext4_get_group_desc
|
||||||
|
[...]
|
||||||
|
```
|
||||||
|
|
||||||
|
现在当你可以看到追踪输出时,你只能看到与内核函数有关的 `ext4` 函数,而你之前已经为其设置了一个过滤器。所有其他的输出都被忽略了。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat trace |head -20
|
||||||
|
# tracer: function
|
||||||
|
#
|
||||||
|
# entries-in-buffer/entries-written: 3871/3871 #P:8
|
||||||
|
#
|
||||||
|
# _-----=> irqs-off
|
||||||
|
# / _----=> need-resched
|
||||||
|
# | / _---=> hardirq/softirq
|
||||||
|
# || / _--=> preempt-depth
|
||||||
|
# ||| / delay
|
||||||
|
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||||
|
# | | | |||| | |
|
||||||
|
cupsd-1066 [004] .... 3308.989545: ext4_file_getattr <-vfs_fstat
|
||||||
|
cupsd-1066 [004] .... 3308.989547: ext4_getattr <-ext4_file_getattr
|
||||||
|
cupsd-1066 [004] .... 3308.989552: ext4_file_getattr <-vfs_fstat
|
||||||
|
cupsd-1066 [004] .... 3308.989553: ext4_getattr <-ext4_file_getattr
|
||||||
|
cupsd-1066 [004] .... 3308.990097: ext4_file_open <-do_dentry_open
|
||||||
|
cupsd-1066 [004] .... 3308.990111: ext4_file_getattr <-vfs_fstat
|
||||||
|
cupsd-1066 [004] .... 3308.990111: ext4_getattr <-ext4_file_getattr
|
||||||
|
cupsd-1066 [004] .... 3308.990122: ext4_llseek <-ksys_lseek
|
||||||
|
cupsd-1066 [004] .... 3308.990130: ext4_file_read_iter <-new_sync_read
|
||||||
|
```
|
||||||
|
|
||||||
|
### 排除要被追踪的函数
|
||||||
|
|
||||||
|
你并不总是知道你想追踪什么,但是,你肯定知道你不想追踪什么。因此,有一个 `set_ftrace_notrace` —— 请注意其中的 "no"。你可以在这个文件中写下你想要的模式,并启用追踪。这样除了所提到的模式外,任何其他东西都会被追踪到。这通常有助于删除那些使我们的输出变得混乱的普通功能。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat set_ftrace_notrace
|
||||||
|
#### no functions disabled ####
|
||||||
|
```
|
||||||
|
|
||||||
|
### 具有目标性的追踪
|
||||||
|
|
||||||
|
到目前为止,你一直在追踪内核中发生的一切。但是,他无法帮助你追踪与某个特定命令有关的事件。为了达到这个目的,你可以按需打开和关闭跟踪,并且在它们之间,运行我们选择的命令,这样你就不会在跟踪输出中得到额外的输出。你可以通过向 `tracing_on` 写入 `1` 来启用跟踪,写 `0` 来关闭跟踪。
|
||||||
|
|
||||||
|
```
|
||||||
|
$ sudo cat tracing_on
|
||||||
|
0
|
||||||
|
|
||||||
|
$ sudo echo 1 > tracing_on
|
||||||
|
$ sudo cat tracing_on
|
||||||
|
1
|
||||||
|
|
||||||
|
$ # Run some specific command that we wish to trace here
|
||||||
|
|
||||||
|
$ sudo echo 0 > tracing_on
|
||||||
|
|
||||||
|
$ cat tracing_on
|
||||||
|
0
|
||||||
|
```
|
||||||
|
|
||||||
|
### 追踪特定的 PID
|
||||||
|
|
||||||
|
如果你想追踪与正在运行的特定进程有关的活动,你可以将该 PID 写入一个名为 `set_ftrace_pid` 的文件,然后启用追踪。这样一来,追踪就只限于这个PID,这在某些情况下是非常有帮助的。
|
||||||
|
|
||||||
|
```
|
||||||
|
`$ sudo echo $PID > set_ftrace_pid`
|
||||||
|
```
|
||||||
|
|
||||||
|
### 总结
|
||||||
|
|
||||||
|
`ftrace` 是一个了解 Linux 内核内部工作的很好方式。通过一些练习,你可以学会对 `ftrace` 进行调整以缩小搜索范围。要想更详细地了解 `ftrace` 和它的高级用法,请看 `ftrace` 的核心作者 Steven Rostedt 写的这些优秀文章。
|
||||||
|
|
||||||
|
* [调试 Linux 内核,第一部分](3)
|
||||||
|
* [调试 Linux 内核,第二部分](4)
|
||||||
|
* [调试 Linux 内核,第三部分](5)
|
||||||
|
|
||||||
|
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
via: https://opensource.com/article/21/7/linux-kernel-ftrace
|
||||||
|
|
||||||
|
作者:[Gaurav Kamathe][a]
|
||||||
|
选题:[lujun9972][b]
|
||||||
|
译者:[萌新阿岩](https://github.com/mengxinayan)
|
||||||
|
校对:[校对者ID](https://github.com/校对者ID)
|
||||||
|
|
||||||
|
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||||
|
|
||||||
|
[a]: https://opensource.com/users/gkamathe
|
||||||
|
[b]: https://github.com/lujun9972
|
||||||
|
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/linux_keyboard_desktop.png?itok=I2nGw78_ (Linux keys on the keyboard for a desktop computer)
|
||||||
|
[2]: https://opensource.com/article/19/2/getting-started-cat-command
|
||||||
|
[3]: https://lwn.net/Articles/365835/
|
||||||
|
[4]: https://lwn.net/Articles/366796/
|
||||||
|
[5]: https://lwn.net/Articles/370423/
|
Loading…
Reference in New Issue
Block a user