Merge pull request #29745 from jrglinux/jrglinux-patch-1

translated
This commit is contained in:
Xingyu.Wang 2023-07-22 13:39:39 +08:00 committed by GitHub
commit fbf993793e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 247 additions and 247 deletions

View File

@ -1,247 +0,0 @@
[#]: subject: "How the GDB debugger and other tools use call frame information to determine the active function calls"
[#]: via: "https://opensource.com/article/23/3/gdb-debugger-call-frame-active-function-calls"
[#]: author: "Will Cohen https://opensource.com/users/wcohen"
[#]: collector: "lkxed"
[#]: translator: "jrglinux"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
How the GDB debugger and other tools use call frame information to determine the active function calls
======
In my [previous article][1], I showed how debuginfo is used to map between the current instruction pointer (IP) and the function or line containing it. That information is valuable in showing what code the processor is currently executing. However, having more context for the calls that lead up to the current function and line being executed is also extremely helpful.
For example, suppose a function in a library has an illegal memory access due to a null pointer being passed as a parameter into the function. Just looking at the current function and line shows that the fault was triggered by attempted access through a null pointer. However, what you really want to know is the full context of the active function calls leading up to that null pointer access, so you can determine how that null pointer was initially passed into the library function. This context information is provided by a backtrace, and allows you to determine which functions could be responsible for the bogus parameter.
One things certain: Determining the currently active function calls is a non-trivial operation.
### Function activation records
Modern programming languages have local variables and allow for recursion where a function can call itself. Also, concurrent programs have multiple threads that may have the same function running at the same time. The local variables cannot be stored in global locations in these situations. The locations of the local variables must be unique for each invocation of the function. Heres how it works:
- The compiler produces a function activation record each time a function is called to store local variables in a unique location.
- For efficiency, the processor stack is used to store the function activation records.
- A new function activation record is created at the top of the processor stack for the function when its called.
- If that function calls another function, then a new function activation record is placed above the existing function activation record.
- Each time there is a return from a function, its function activation record is removed from the stack.
The creation of the function activation record is created by code in the function called the prologue. The removal of the function activation record is handled by the function epilogue. The body of the function can make use of the memory set aside on the stack for it for temporary values and local variables.
Function activation records can be variable size. For some functions, theres no need for space to store local variables. Ideally, the function activation record only needs to store the return address of the function that called _this_ function. For other functions, significant space may be required to store local data structures for the function in addition to the return address. This variation in frame sizes leads to compilers using frame pointers to track the start of the functions activation frame. Now the function prologue code has the additional task of storing the old frame pointer before creating a new frame pointer for the current function, and the epilogue has to restore the old frame pointer value.
The way that the function activation record is laid out, the return address and old frame pointer of the calling function are constant offsets from the current frame pointer. With the old frame pointer, the next functions activation frame on the stack can be located. This process is repeated until all the function activation records have been examined.
### Optimization complications
There are a couple of disadvantages to having explicit frame pointers in code. On some processors, there are relatively few registers available. Having an explicit frame pointer causes more memory operations to be used. The resulting code is slower because the frame pointer must be in one of the registers. Having explicit frame pointers may constrain the code that the compiler can generate, because the compiler may not intermix the function prologue and epilogue code with the body of the function.
The compilers goal is to generate fast code where possible, so compilers typically omit frame pointers from generated code. Keeping frame pointers can significantly lower performance, as shown by [Phoronixs benchmarking][2]. The downside of omitting frame pointers is that finding the previous calling functions activation frame and return address are no longer simple offsets from the frame pointer.
### Call Frame Information
To aid in the generation of function backtraces, the compiler includes DWARF Call Frame Information (CFI) to reconstruct frame pointers and to find return addresses. This supplemental information is stored in the `.eh_frame` section of the execution. Unlike traditional debuginfo for function and line location information, the `.eh_frame` section is in the executable even when the executable is generated without debug information, or when the debug information has been stripped from the file. The call frame information is essential for the operation of language constructs like **throw-catch** in C++.
The CFI has a Frame Description Entry (FDE) for each function. As one of its steps, the backtrace generation process finds the appropriate FDE for the current activation frame being examined. Think of the FDE as a table, with each row representing one or more instructions, with these columns:
- Canonical Frame Address (CFA), the location the frame pointer would point to
- The return address
- Information about other registers
The encoding of the FDE is designed to minimize the amount of space required. The FDE describes the changes between rows rather than fully specify each row. To further compress the data, starting information common to multiple FDEs is factored out and placed in Common Information Entries (CIE). This makes the FDE more compact, but it also requires more work to compute the actual CFA and find the return address location. The tool must start from the uninitialized state. It steps through the entries in the CIE to get the initial state on function entry, then it moves on to process the FDE by starting at the FDEs first entry, and processes operations until it gets to the row that covers the instruction pointer currently being analyzed.
### Example use of Call Frame Information
Start with a simple example with a function that converts Fahrenheit to Celsius. Inlined functions do not have entries in the CFI, so the `__attribute__((noinline))` for the `f2c` function ensures the compiler keeps `f2c` as a real function.
```
#include <stdio.h>
int __attribute__ ((noinline)) f2c(int f)
{
int c;
printf("converting\n");
c = (f-32.0) * 5.0 /9.0;
return c;
}
int main (int argc, char *argv[])
{
int f;
scanf("%d", &f);
printf ("%d Fahrenheit = %d Celsius\n",
f, f2c(f));
return 0;
}
```
Compile the code with:
```
$ gcc -O2 -g -o f2c f2c.c
```
The `.eh_frame` is there as expected:
```
$ eu-readelf -S f2c |grep eh_frame
[17] .eh_frame_hdr PROGBITS 0000000000402058 00002058 00000034 0 A 0 0 4
[18] .eh_frame PROGBITS 0000000000402090 00002090 000000a0 0 A 0 0 8
```
We can get the CFI information in human readable form with:
```
$ readelf --debug-dump=frames f2c > f2c.cfi
```
Generate a disassembly file of the `f2c` binary so you can look up the addresses of the `f2c` and `main` functions:
```
$ objdump -d f2c > f2c.dis
```
Find the following lines in `f2c.dis` to see the start of `f2c` and `main`:
```
0000000000401060 <main>:
0000000000401190 <f2c>:
```
In many cases, all the functions in the binary use the same CIE to define the initial conditions before a functions first instruction is executed. In this example, both `f2c` and `main` use the following CIE:
```
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
```
For this example, dont worry about the Augmentation or Augmentation data entries. Because x86_64 processors have variable length instructions from 1 to 15 bytes in size, the “Code alignment factor” is set to 1. On a processor that only has 32-bit (4 byte instructions), this would be set to 4 and would allow more compact encoding of how many bytes a row of state information applies to. In a similar fashion, there is the “Data alignment factor” to make the adjustments to where the CFA is located more compact. On x86_64, the stack slots are 8 bytes in size.
The column in the virtual table that holds the return address is 16. This is used in the instructions at the tail end of the CIE. There are four `DW_CFA` instructions. The first instruction, `DW_CFA_def_cfa` describes how to compute the Canonical Frame Address (CFA) that a frame pointer would point at if the code had a frame pointer. In this case, the CFA is computed from `r7 (rsp)` and `CFA=rsp+8`.
The second instruction `DW_CFA_offset` defines where to obtain the return address `CFA-8`. In this case, the return address is currently pointed to by the stack pointer `(rsp+8)-8`. The CFA starts right above the return address on the stack.
The `DW_CFA_nop` at the end of the CIE is padding to keep alignment in the DWARF information. The FDE can also have padding at the end of the for alignment.
Find the FDE for `main` in `f2c.cfi`, which covers the `main` function from `0x40160` up to, but not including, `0x401097`:
```
00000084 0000000000000014 00000088 FDE cie=00000000 pc=0000000000401060..0000000000401097
DW_CFA_advance_loc: 4 to 0000000000401064
DW_CFA_def_cfa_offset: 32
DW_CFA_advance_loc: 50 to 0000000000401096
DW_CFA_def_cfa_offset: 8
DW_CFA_nop
```
Before executing the first instruction in the function, the CIE describes the call frame state. However, as the processor executes instructions in the function, the details will change. First the instructions `DW_CFA_advance_loc` and `DW_CFA_def_cfa_offset` match up with the first instruction in `main` at `401060`. This adjusts the stack pointer down by `0x18` (24 bytes). The CFA has not changed location but the stack pointer has, so the correct computation for CFA at `401064` is `rsp+32`. Thats the extent of the prologue instruction in this code. Here are the first couple of instructions in `main`:
```
0000000000401060 <main>:
401060: 48 83 ec 18 sub $0x18,%rsp
401064: bf 1b 20 40 00 mov $0x40201b,%edi
```
The `DW_CFA_advance_loc` makes the current row apply to the next 50 bytes of code in the function, until `401096`. The CFA is at `rsp+32` until the stack adjustment instruction at `401092` completes execution. The `DW_CFA_def_cfa_offset` updates the calculations of the CFA to the same as entry into the function. This is expected, because the next instruction at `401096` is the return instruction `(ret`) and pops the return value off the stack.
```
401090: 31 c0 xor %eax,%eax
401092: 48 83 c4 18 add $0x18,%rsp
401096: c3 ret
```
This FDE for `f2c` function uses the same CIE as the `main` function, and covers the range of `0x41190` to `0x4011c3`:
```
00000068 0000000000000018 0000006c FDE cie=00000000 pc=0000000000401190..00000000004011c3
DW_CFA_advance_loc: 1 to 0000000000401191
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r3 (rbx) at cfa-16
DW_CFA_advance_loc: 29 to 00000000004011ae
DW_CFA_def_cfa_offset: 8
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop
```
The `objdump` output for the `f2c` function in the binary:
```
0000000000401190 <f2c>:
401190: 53 push %rbx
401191: 89 fb mov %edi,%ebx
401193: bf 10 20 40 00 mov $0x402010,%edi
401198: e8 93 fe ff ff call 401030 <puts@plt>
40119d: 66 0f ef c0 pxor %xmm0,%xmm0
4011a1: f2 0f 2a c3 cvtsi2sd %ebx,%xmm0
4011a5: f2 0f 5c 05 93 0e 00 subsd 0xe93(%rip),%xmm0 # 402040 <__dso_handle+0x38>
4011ac: 00
4011ad: 5b pop %rbx
4011ae: f2 0f 59 05 92 0e 00 mulsd 0xe92(%rip),%xmm0 # 402048 <__dso_handle+0x40>
4011b5: 00
4011b6: f2 0f 5e 05 92 0e 00 divsd 0xe92(%rip),%xmm0 # 402050 <__dso_handle+0x48>
4011bd: 00
4011be: f2 0f 2c c0 cvttsd2si %xmm0,%eax
4011c2: c3 ret
```
In the FDE for `f2c`, theres a single byte instruction at the beginning of the function with the `DW_CFA_advance_loc`. Following the advance operation, there are two additional operations. A `DW_CFA_def_cfa_offset` changes the CFA to `%rsp+16` and a `DW_CFA_offset` indicates that the initial value in `%rbx` is now at `CFA-16` (the top of the stack).
Looking at this `fc2` disassembly code, you can see that a `push` is used to save `%rbx` onto the stack. One of the advantages of omitting the frame pointer in the code generation is that compact instructions like `push` and `pop` can be used to store and retrieve values from the stack. In this case, `%rbx` is saved because the `%rbx` is used to pass arguments to the `printf` function (actually converted to a `puts` call), but the initial value of `f` passed into the function needs to be saved for the later computation. The `DW_CFA_advance_loc` 29 bytes to `4011ae` shows the next state change just after `pop %rbx`, which recovers the original value of `%rbx`. The `DW_CFA_def_cfa_offset` notes the pop changed CFA to be `%rsp+8`.
### GDB using the Call Frame Information
Having the CFI information allows [GNU Debugger (GDB)][3] and other tools to generate accurate backtraces. Without CFI information, GDB would have a difficult time finding the return address. You can see GDB making use of this information, if you set a breakpoint at line 7 of `f2c.c`. GDB puts the breakpoint before the `pop %rbx` in the `f2c` function is done and the return value is not at the top of the stack.
GDB is able to unwind the stack, and as a bonus is also able to fetch the argument `f` that was currently saved on the stack:
```
$ gdb f2c
[...]
(gdb) break f2c.c:7
Breakpoint 1 at 0x40119d: file f2c.c, line 7.
(gdb) run
Starting program: /home/wcohen/present/202207youarehere/f2c
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
98
converting
Breakpoint 1, f2c (f=98) at f2c.c:8
8 return c;
(gdb) where
#0 f2c (f=98) at f2c.c:8
#1 0x000000000040107e in main (argc=<optimized out>, argv=<optimized out>)
at f2c.c:15
```
### Call Frame Information
The DWARF Call Frame Information provides a flexible way for a compiler to include information for accurate unwinding of the stack. This makes it possible to determine the currently active function calls. Ive provided a brief introduction in this article, but for more details on how the DWARF implements this mechanism, see the [DWARF specification][4].
--------------------------------------------------------------------------------
via: https://opensource.com/article/23/3/gdb-debugger-call-frame-active-function-calls
作者:[Will Cohen][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/wcohen
[b]: https://github.com/lkxed/
[1]: https://opensource.com/article/23/2/compiler-optimization-debugger-line-information
[2]: https://www.phoronix.com/review/fedora-frame-pointer
[3]: https://opensource.com/article/21/3/debug-code-gdb
[4]: https://dwarfstd.org/Download.php

View File

@ -0,0 +1,247 @@
[#]: subject: "How the GDB debugger and other tools use call frame information to determine the active function calls"
[#]: via: "https://opensource.com/article/23/3/gdb-debugger-call-frame-active-function-calls"
[#]: author: "Will Cohen https://opensource.com/users/wcohen"
[#]: collector: "lkxed"
[#]: translator: "jrglinux"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
GDB以及其他调试器如何通过调用帧信息来确定函数调用关系
======
在我的[上一篇文章][1]中,我展示了如何使用 debuginfo 在当前指令指针 (IP) 和包含它的函数或行之间进行映射。 该信息对于显示 CPU 当前正在执行的代码很有帮助。 不过,如果能显示更多的有关当前函数调用栈及其正在执行语句的上下文对我们定位问题来说也是十分有助的。
例如,将空指针作为参数传递到函数中而导致非法内存访问的问题, 只需查看当前执行函数行即可发现该错误是由尝试通过空指针进行访问而触发的。 但是,若真正知道导致空指针访问的函数调用的完整上下文,便确定该空指针最初是如何传递到该函数中的。 此上下文信息由回溯提供,并允许您确定哪些函数可能对空指针参数负责。
有一点是肯定的:确定当前活动的函数调用栈不是一项简单的操作。
### 函数激活记录
现代编程语言具有局部变量,并允许函数可以调用自身的递归。 此外,并发程序具有多个线程,这些线程可能同时运行相同的函数。 在这些情况下,局部变量不能存储在全局位置。 对于函数的每次调用,局部变量的位置必须是唯一的。 它的工作原理如下:
- 每次调用函数时,编译器都会生成函数激活记录,以将局部变量存储在唯一位置。
- 为了提高效率,处理器堆栈用于存储函数激活记录。
- 当函数被调用时,会在处理器堆栈的顶部为该函数创建一条新的函数激活记录。
- 如果该函数调用另一个函数,则新的函数激活记录将放置在现有函数激活记录之上。
- 每次函数返回时,其函数激活记录都会从堆栈中删除。
函数激活记录的创建是由函数中称为序言的代码创建的。 函数激活记录的删除由函数尾声处理。 函数体可以利用堆栈上为其预留的内存来存储临时值和局部变量。
函数激活记录的大小可以是可变的。 对于某些函数,不需要空间来存储局部变量。 理想情况下函数激活记录只需要存储调用_this_函数的函数的返回地址。 对于其他函数,除了返回地址之外,可能还需要大量空间来存储函数的本地数据结构。 帧大小的可变导致编译器使用帧指针来跟踪函数激活帧的开始。函数序言代码具有在为当前函数创建新帧指针之前存储旧帧指针的额外任务,并且函数尾声必须恢复旧帧指针值。
函数激活记录的布局方式、调用函数的返回地址和旧帧指针是相对于当前帧指针的恒定偏移量。 通过旧的帧指针,可以定位堆栈上下一个函数的激活帧。 重复此过程,直到检查完所有功能激活记录为止。
### 优化复杂性
在代码中使用显式帧指针有几个缺点。 在某些处理器上,可用的寄存器相对较少。 具有显式帧指针会导致使用更多内存操作。 生成的代码速度较慢,因为帧指针必须位于寄存器中。 具有显式帧指针可能会限制编译器可以生成的代码,因为编译器可能不会将函数序言和结尾代码与函数体混合。
编译器的目标是尽可能生成快速代码,因此编译器通常会从生成的代码中省略帧指针。 正如 [Phoronix 的基准测试][2] 所示,保留帧指针会显着降低性能。 不过省略帧指针也有缺点,查找前一个调用函数的激活帧和返回地址不再是相对于帧指针的简单偏移。
### 调用帧信息
为了帮助生成函数回溯,编译器包含 DWARF 调用帧信息 (CFI) 来重建帧指针并查找返回地址。 此补充信息存储在执行的 `.eh_frame` 部分中。 与传统的函数和行位置信息的 debuginfo 不同,即使生成的可执行文件没有调试信息,或者调试信息已从文件中删除,`.eh_frame` 部分也位于可执行文件中。 调用帧信息对于 C++ 中的 **throw-catch** 等语言结构的操作至关重要。
CFI 的每个功能都有一个帧描述条目 (FDE)。 作为其步骤之一,回溯生成过程为当前正在检查的激活帧找到适当的 FDE。 将 FDE 视为一张表,每一行代表一个或多个指令,并具有以下列:
- 规范帧地址CFA帧指针指向的位置
- 返回地址
- 有关其他寄存器的信息
FDE 的编码旨在最大限度地减少所需的空间量。 FDE 描述了行之间的变化,而不是完全指定每一行。 为了进一步压缩数据,多个 FDE 共有的起始信息被分解出来并放置在通用信息条目 (CIE) 中。 这使得 FDE 更加紧凑,但也需要更多的工作来计算实际的 CFA 并找到返回地址位置。 该工具必须从未初始化状态启动。 它逐步遍历 CIE 中的条目以获取函数条目的初始状态,然后从 FDE 的第一个条目开始继续处理 FDE并处理操作直到到达覆盖当前正在分析的指令指针的行。
### 调用帧信息使用实例
从一个简单的示例开始,其中包含将华氏温度转换为摄氏度的函数。 内联函数在 CFI 中没有条目,因此 `f2c` 函数的 `__attribute__((noinline))` 确保编译器将 `f2c` 保留为真实函数。
```
#include <stdio.h>
int __attribute__ ((noinline)) f2c(int f)
{
int c;
printf("converting\n");
c = (f-32.0) * 5.0 /9.0;
return c;
}
int main (int argc, char *argv[])
{
int f;
scanf("%d", &f);
printf ("%d Fahrenheit = %d Celsius\n",
f, f2c(f));
return 0;
}
```
编译代码:
```
$ gcc -O2 -g -o f2c f2c.c
```
`.eh_frame` 部分展示如下:
```
$ eu-readelf -S f2c |grep eh_frame
[17] .eh_frame_hdr PROGBITS 0000000000402058 00002058 00000034 0 A 0 0 4
[18] .eh_frame PROGBITS 0000000000402090 00002090 000000a0 0 A 0 0 8
```
我们可以通过以下方式获取 CFI 信息:
```
$ readelf --debug-dump=frames f2c > f2c.cfi
```
生成 `f2c` 可执行文件的反汇编代码,这样你可以查找 `f2c``main` 函数:
```
$ objdump -d f2c > f2c.dis
```
`f2c.dis` 中找到以下信息来看看 `f2c``main` 函数的执行位置:
```
0000000000401060 <main>:
0000000000401190 <f2c>:
```
在许多情况下,二进制文件中的所有函数在执行函数的第一条指令之前都使用相同的 CIE 来定义初始条件。 在此示例中, `f2c``main` 都使用以下 CIE
```
00000000 0000000000000014 00000000 CIE
Version: 1
Augmentation: "zR"
Code alignment factor: 1
Data alignment factor: -8
Return address column: 16
Augmentation data: 1b
DW_CFA_def_cfa: r7 (rsp) ofs 8
DW_CFA_offset: r16 (rip) at cfa-8
DW_CFA_nop
DW_CFA_nop
```
本示例中,不必担心增强或增强数据条目。 由于 x86_64 处理器具有 1 到 15 字节大小的可变长度指令,因此 "代码对齐因子" 设置为 1。在只有 32 位4 字节指令)的处理器上,"代码对齐因子" 设置为 4并且允许对一行状态信息适用的字节数进行更紧凑的编码。 类似地,还有 "数据对齐因子" 来使 CFA 所在位置的调整更加紧凑。 在 x86_64 上,堆栈槽的大小为 8 个字节。
虚拟表中保存返回地址的列是 16。这在 CIE 尾部的指令中使用。 有四个 `DW_CFA` 指令。 第一条指令 `DW_CFA_def_cfa` 描述了如果代码具有帧指针如何计算帧指针将指向的规范帧地址CFA。 在这种情况下CFA 是根据 `r7 (rsp)``CFA=rsp+8` 计算的。
第二条指令 `DW_CFA_offset` 定义从哪里获取返回地址 `CFA-8` 。 在这种情况下,返回地址当前由堆栈指针 `(rsp+8)-8` 指向。 CFA 从堆栈返回地址的正上方开始。
CIE 末尾的 `DW_CFA_nop` 进行填充以保持 DWARF 信息的对齐。 FDE 还可以在末尾添加填充以进行对齐。
`f2c.cfi` 中找到 `main` 的 FDE它涵盖了从 `0x40160` 到(但不包括)`0x401097` 的 `main` 函数:
```
00000084 0000000000000014 00000088 FDE cie=00000000 pc=0000000000401060..0000000000401097
DW_CFA_advance_loc: 4 to 0000000000401064
DW_CFA_def_cfa_offset: 32
DW_CFA_advance_loc: 50 to 0000000000401096
DW_CFA_def_cfa_offset: 8
DW_CFA_nop
```
在执行函数中的第一条指令之前CIE 描述调用帧状态。 然而,当处理器执行函数中的指令时,细节将会改变。 首先,指令 `DW_CFA_advance_loc``DW_CFA_def_cfa_offset``main``401060` 处的第一条指令匹配。 这会将堆栈指针向下调整 `0x18`24 个字节)。 CFA 没有改变位置,但堆栈指针改变了,因此 CFA 在 `401064` 处的正确计算是 `rsp+32`。 这就是这段代码中序言指令的范围。 以下是 `main` 中的前几条指令:
```
0000000000401060 <main>:
401060: 48 83 ec 18 sub $0x18,%rsp
401064: bf 1b 20 40 00 mov $0x40201b,%edi
```
`DW_CFA_advance_loc` 使当前行应用于函数中接下来的 50 个字节的代码,直到 `401096`。 CFA 位于 `rsp+32`,直到 `401092` 处的堆栈调整指令完成执行。 `DW_CFA_def_cfa_offset` 将 CFA 的计算更新为与函数入口相同。 这是预期之中的,因为 `401096` 处的下一条指令是返回指令 `ret`,并将返回值从堆栈中弹出。
```
401090: 31 c0 xor %eax,%eax
401092: 48 83 c4 18 add $0x18,%rsp
401096: c3 ret
```
`f2c` 函数的 FDE 使用与 `main` 函数相同的 CIE并覆盖 `0x41190``0x4011c3` 的范围:
```
00000068 0000000000000018 0000006c FDE cie=00000000 pc=0000000000401190..00000000004011c3
DW_CFA_advance_loc: 1 to 0000000000401191
DW_CFA_def_cfa_offset: 16
DW_CFA_offset: r3 (rbx) at cfa-16
DW_CFA_advance_loc: 29 to 00000000004011ae
DW_CFA_def_cfa_offset: 8
DW_CFA_nop
DW_CFA_nop
DW_CFA_nop
```
可执行文件中 `f2c` 函数的 `objdump` 输出:
```
0000000000401190 <f2c>:
401190: 53 push %rbx
401191: 89 fb mov %edi,%ebx
401193: bf 10 20 40 00 mov $0x402010,%edi
401198: e8 93 fe ff ff call 401030 <puts@plt>
40119d: 66 0f ef c0 pxor %xmm0,%xmm0
4011a1: f2 0f 2a c3 cvtsi2sd %ebx,%xmm0
4011a5: f2 0f 5c 05 93 0e 00 subsd 0xe93(%rip),%xmm0 # 402040 <__dso_handle+0x38>
4011ac: 00
4011ad: 5b pop %rbx
4011ae: f2 0f 59 05 92 0e 00 mulsd 0xe92(%rip),%xmm0 # 402048 <__dso_handle+0x40>
4011b5: 00
4011b6: f2 0f 5e 05 92 0e 00 divsd 0xe92(%rip),%xmm0 # 402050 <__dso_handle+0x48>
4011bd: 00
4011be: f2 0f 2c c0 cvttsd2si %xmm0,%eax
4011c2: c3 ret
```
`f2c` 的 FDE 中,函数开头有一个带有 `DW_CFA_advance_loc` 的单字节指令。在高级操作之后,还有两个附加操作。`DW_CFA_def_cfa_offset` 将 CFA 更改为 `%rsp+16``DW_CFA_offset` 表示 `%rbx` 中的初始值现在位于 `CFA-16`(堆栈顶部)。
查看这个 `fc2` 反汇编代码,可以看到 `push` 用于将 `%rbx` 保存到堆栈中。 在代码生成中省略帧指针的优点之一是可以使用 `push``pop` 等紧凑指令在堆栈中存储和检索值。 在这种情况下,保存 `%rbx` 是因为 `%rbx` 用于向 `printf` 函数传递参数(实际上转换为 `puts` 调用),但需要保存传递到函数中的 `f` 初始值以供后面的计算使用。 `4011ae``DW_CFA_advance_loc` 29字节显示了 `pop %rbx` 之后的下一个状态变化,它恢复了 `%rbx` 的原始值。 `DW_CFA_def_cfa_offset` 指出 pop 将 CFA 更改为 `%rsp+8`
### GDB使用调用帧信息
有了 CFI 信息,[GNU 调试器 (GDB)][3] 和其他工具就可以生成准确的回溯。 如果没有 CFI 信息GDB 将很难找到返回地址。 如果在 `f2c.c` 的第 7 行设置断点,可以看到 GDB 使用此信息。 GDB在 `f2c` 函数中的 `pop %rbx` 完成且返回值不在栈顶之前放置了断点。
GDB 能够展开堆栈,并且作为额外收获还能够获取当前保存在堆栈上的参数 `f`
```
$ gdb f2c
[...]
(gdb) break f2c.c:7
Breakpoint 1 at 0x40119d: file f2c.c, line 7.
(gdb) run
Starting program: /home/wcohen/present/202207youarehere/f2c
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
98
converting
Breakpoint 1, f2c (f=98) at f2c.c:8
8 return c;
(gdb) where
#0 f2c (f=98) at f2c.c:8
#1 0x000000000040107e in main (argc=<optimized out>, argv=<optimized out>)
at f2c.c:15
```
### 调用帧信息
DWARF 调用帧信息为编译器提供了一种灵活的方式来包含用于准确展开堆栈的信息。这使得可以确定当前活动的函数调用。我在本文中提供了简要介绍,但有关 DWARF 如何实现此机制的更多详细信息,请参阅 [DWARF 规范][4]。
--------------------------------------------------------------------------------
via: https://opensource.com/article/23/3/gdb-debugger-call-frame-active-function-calls
作者:[Will Cohen][a]
选题:[lkxed][b]
译者:[jrglinux](https://github.com/jrglinux)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/wcohen
[b]: https://github.com/lkxed/
[1]: https://opensource.com/article/23/2/compiler-optimization-debugger-line-information
[2]: https://www.phoronix.com/review/fedora-frame-pointer
[3]: https://opensource.com/article/21/3/debug-code-gdb
[4]: https://dwarfstd.org/Download.php