[#]: subject: "How the GDB debugger and other tools use call frame information to determine the active function calls" [#]: via: "https://opensource.com/article/23/3/gdb-debugger-call-frame-active-function-calls" [#]: author: "Will Cohen https://opensource.com/users/wcohen" [#]: collector: "lkxed" [#]: translator: " " [#]: reviewer: " " [#]: publisher: " " [#]: url: " " How the GDB debugger and other tools use call frame information to determine the active function calls ====== In my [previous article][1], I showed how debuginfo is used to map between the current instruction pointer (IP) and the function or line containing it. That information is valuable in showing what code the processor is currently executing. However, having more context for the calls that lead up to the current function and line being executed is also extremely helpful. For example, suppose a function in a library has an illegal memory access due to a null pointer being passed as a parameter into the function. Just looking at the current function and line shows that the fault was triggered by attempted access through a null pointer. However, what you really want to know is the full context of the active function calls leading up to that null pointer access, so you can determine how that null pointer was initially passed into the library function. This context information is provided by a backtrace, and allows you to determine which functions could be responsible for the bogus parameter. One thing’s certain: Determining the currently active function calls is a non-trivial operation. ### Function activation records Modern programming languages have local variables and allow for recursion where a function can call itself. Also, concurrent programs have multiple threads that may have the same function running at the same time. The local variables cannot be stored in global locations in these situations. The locations of the local variables must be unique for each invocation of the function. Here’s how it works: - The compiler produces a function activation record each time a function is called to store local variables in a unique location. - For efficiency, the processor stack is used to store the function activation records. - A new function activation record is created at the top of the processor stack for the function when it’s called. - If that function calls another function, then a new function activation record is placed above the existing function activation record. - Each time there is a return from a function, its function activation record is removed from the stack. The creation of the function activation record is created by code in the function called the prologue. The removal of the function activation record is handled by the function epilogue. The body of the function can make use of the memory set aside on the stack for it for temporary values and local variables. Function activation records can be variable size. For some functions, there’s no need for space to store local variables. Ideally, the function activation record only needs to store the return address of the function that called _this_ function. For other functions, significant space may be required to store local data structures for the function in addition to the return address. This variation in frame sizes leads to compilers using frame pointers to track the start of the function’s activation frame. Now the function prologue code has the additional task of storing the old frame pointer before creating a new frame pointer for the current function, and the epilogue has to restore the old frame pointer value. The way that the function activation record is laid out, the return address and old frame pointer of the calling function are constant offsets from the current frame pointer. With the old frame pointer, the next function’s activation frame on the stack can be located. This process is repeated until all the function activation records have been examined. ### Optimization complications There are a couple of disadvantages to having explicit frame pointers in code. On some processors, there are relatively few registers available. Having an explicit frame pointer causes more memory operations to be used. The resulting code is slower because the frame pointer must be in one of the registers. Having explicit frame pointers may constrain the code that the compiler can generate, because the compiler may not intermix the function prologue and epilogue code with the body of the function. The compiler’s goal is to generate fast code where possible, so compilers typically omit frame pointers from generated code. Keeping frame pointers can significantly lower performance, as shown by [Phoronix’s benchmarking][2]. The downside of omitting frame pointers is that finding the previous calling function’s activation frame and return address are no longer simple offsets from the frame pointer. ### Call Frame Information To aid in the generation of function backtraces, the compiler includes DWARF Call Frame Information (CFI) to reconstruct frame pointers and to find return addresses. This supplemental information is stored in the `.eh_frame` section of the execution. Unlike traditional debuginfo for function and line location information, the `.eh_frame` section is in the executable even when the executable is generated without debug information, or when the debug information has been stripped from the file. The call frame information is essential for the operation of language constructs like **throw-catch** in C++. The CFI has a Frame Description Entry (FDE) for each function. As one of its steps, the backtrace generation process finds the appropriate FDE for the current activation frame being examined. Think of the FDE as a table, with each row representing one or more instructions, with these columns: - Canonical Frame Address (CFA), the location the frame pointer would point to - The return address - Information about other registers The encoding of the FDE is designed to minimize the amount of space required. The FDE describes the changes between rows rather than fully specify each row. To further compress the data, starting information common to multiple FDEs is factored out and placed in Common Information Entries (CIE). This makes the FDE more compact, but it also requires more work to compute the actual CFA and find the return address location. The tool must start from the uninitialized state. It steps through the entries in the CIE to get the initial state on function entry, then it moves on to process the FDE by starting at the FDE’s first entry, and processes operations until it gets to the row that covers the instruction pointer currently being analyzed. ### Example use of Call Frame Information Start with a simple example with a function that converts Fahrenheit to Celsius. Inlined functions do not have entries in the CFI, so the `__attribute__((noinline))` for the `f2c` function ensures the compiler keeps `f2c` as a real function. ``` #include int __attribute__ ((noinline)) f2c(int f) { int c; printf("converting\n"); c = (f-32.0) * 5.0 /9.0; return c; } int main (int argc, char *argv[]) { int f; scanf("%d", &f); printf ("%d Fahrenheit = %d Celsius\n", f, f2c(f)); return 0; } ``` Compile the code with: ``` $ gcc -O2 -g -o f2c f2c.c ``` The `.eh_frame` is there as expected: ``` $ eu-readelf -S f2c |grep eh_frame [17] .eh_frame_hdr PROGBITS 0000000000402058 00002058 00000034 0 A 0 0 4 [18] .eh_frame PROGBITS 0000000000402090 00002090 000000a0 0 A 0 0 8 ``` We can get the CFI information in human readable form with: ``` $ readelf --debug-dump=frames f2c > f2c.cfi ``` Generate a disassembly file of the `f2c` binary so you can look up the addresses of the `f2c` and `main` functions: ``` $ objdump -d f2c > f2c.dis ``` Find the following lines in `f2c.dis` to see the start of `f2c` and `main`: ``` 0000000000401060
: 0000000000401190 : ``` In many cases, all the functions in the binary use the same CIE to define the initial conditions before a function’s first instruction is executed. In this example, both `f2c` and `main` use the following CIE: ``` 00000000 0000000000000014 00000000 CIE Version: 1 Augmentation: "zR" Code alignment factor: 1 Data alignment factor: -8 Return address column: 16 Augmentation data: 1b DW_CFA_def_cfa: r7 (rsp) ofs 8 DW_CFA_offset: r16 (rip) at cfa-8 DW_CFA_nop DW_CFA_nop ``` For this example, don’t worry about the Augmentation or Augmentation data entries. Because x86_64 processors have variable length instructions from 1 to 15 bytes in size, the “Code alignment factor” is set to 1. On a processor that only has 32-bit (4 byte instructions), this would be set to 4 and would allow more compact encoding of how many bytes a row of state information applies to. In a similar fashion, there is the “Data alignment factor” to make the adjustments to where the CFA is located more compact. On x86_64, the stack slots are 8 bytes in size. The column in the virtual table that holds the return address is 16. This is used in the instructions at the tail end of the CIE. There are four `DW_CFA` instructions. The first instruction, `DW_CFA_def_cfa` describes how to compute the Canonical Frame Address (CFA) that a frame pointer would point at if the code had a frame pointer. In this case, the CFA is computed from `r7 (rsp)` and `CFA=rsp+8`. The second instruction `DW_CFA_offset` defines where to obtain the return address `CFA-8`. In this case, the return address is currently pointed to by the stack pointer `(rsp+8)-8`. The CFA starts right above the return address on the stack. The `DW_CFA_nop` at the end of the CIE is padding to keep alignment in the DWARF information. The FDE can also have padding at the end of the for alignment. Find the FDE for `main` in `f2c.cfi`, which covers the `main` function from `0x40160` up to, but not including, `0x401097`: ``` 00000084 0000000000000014 00000088 FDE cie=00000000 pc=0000000000401060..0000000000401097 DW_CFA_advance_loc: 4 to 0000000000401064 DW_CFA_def_cfa_offset: 32 DW_CFA_advance_loc: 50 to 0000000000401096 DW_CFA_def_cfa_offset: 8 DW_CFA_nop ``` Before executing the first instruction in the function, the CIE describes the call frame state. However, as the processor executes instructions in the function, the details will change. First the instructions `DW_CFA_advance_loc` and `DW_CFA_def_cfa_offset` match up with the first instruction in `main` at `401060`. This adjusts the stack pointer down by `0x18` (24 bytes). The CFA has not changed location but the stack pointer has, so the correct computation for CFA at `401064` is `rsp+32`. That’s the extent of the prologue instruction in this code. Here are the first couple of instructions in `main`: ``` 0000000000401060
: 401060: 48 83 ec 18 sub $0x18,%rsp 401064: bf 1b 20 40 00 mov $0x40201b,%edi ``` The `DW_CFA_advance_loc` makes the current row apply to the next 50 bytes of code in the function, until `401096`. The CFA is at `rsp+32` until the stack adjustment instruction at `401092` completes execution. The `DW_CFA_def_cfa_offset` updates the calculations of the CFA to the same as entry into the function. This is expected, because the next instruction at `401096` is the return instruction `(ret`) and pops the return value off the stack. ``` 401090: 31 c0 xor %eax,%eax 401092: 48 83 c4 18 add $0x18,%rsp 401096: c3 ret ``` This FDE for `f2c` function uses the same CIE as the `main` function, and covers the range of `0x41190` to `0x4011c3`: ``` 00000068 0000000000000018 0000006c FDE cie=00000000 pc=0000000000401190..00000000004011c3 DW_CFA_advance_loc: 1 to 0000000000401191 DW_CFA_def_cfa_offset: 16 DW_CFA_offset: r3 (rbx) at cfa-16 DW_CFA_advance_loc: 29 to 00000000004011ae DW_CFA_def_cfa_offset: 8 DW_CFA_nop DW_CFA_nop DW_CFA_nop ``` The `objdump` output for the `f2c` function in the binary: ``` 0000000000401190 : 401190: 53 push %rbx 401191: 89 fb mov %edi,%ebx 401193: bf 10 20 40 00 mov $0x402010,%edi 401198: e8 93 fe ff ff call 401030 40119d: 66 0f ef c0 pxor %xmm0,%xmm0 4011a1: f2 0f 2a c3 cvtsi2sd %ebx,%xmm0 4011a5: f2 0f 5c 05 93 0e 00 subsd 0xe93(%rip),%xmm0 # 402040 <__dso_handle+0x38> 4011ac: 00 4011ad: 5b pop %rbx 4011ae: f2 0f 59 05 92 0e 00 mulsd 0xe92(%rip),%xmm0 # 402048 <__dso_handle+0x40> 4011b5: 00 4011b6: f2 0f 5e 05 92 0e 00 divsd 0xe92(%rip),%xmm0 # 402050 <__dso_handle+0x48> 4011bd: 00 4011be: f2 0f 2c c0 cvttsd2si %xmm0,%eax 4011c2: c3 ret ``` In the FDE for `f2c`, there’s a single byte instruction at the beginning of the function with the `DW_CFA_advance_loc`. Following the advance operation, there are two additional operations. A `DW_CFA_def_cfa_offset` changes the CFA to `%rsp+16` and a `DW_CFA_offset` indicates that the initial value in `%rbx` is now at `CFA-16` (the top of the stack). Looking at this `fc2` disassembly code, you can see that a `push` is used to save `%rbx` onto the stack. One of the advantages of omitting the frame pointer in the code generation is that compact instructions like `push` and `pop` can be used to store and retrieve values from the stack. In this case, `%rbx` is saved because the `%rbx` is used to pass arguments to the `printf` function (actually converted to a `puts` call), but the initial value of `f` passed into the function needs to be saved for the later computation. The `DW_CFA_advance_loc` 29 bytes to `4011ae` shows the next state change just after `pop %rbx`, which recovers the original value of `%rbx`. The `DW_CFA_def_cfa_offset` notes the pop changed CFA to be `%rsp+8`. ### GDB using the Call Frame Information Having the CFI information allows [GNU Debugger (GDB)][3] and other tools to generate accurate backtraces. Without CFI information, GDB would have a difficult time finding the return address. You can see GDB making use of this information, if you set a breakpoint at line 7 of `f2c.c`. GDB puts the breakpoint before the `pop %rbx` in the `f2c` function is done and the return value is not at the top of the stack. GDB is able to unwind the stack, and as a bonus is also able to fetch the argument `f` that was currently saved on the stack: ``` $ gdb f2c [...] (gdb) break f2c.c:7 Breakpoint 1 at 0x40119d: file f2c.c, line 7. (gdb) run Starting program: /home/wcohen/present/202207youarehere/f2c [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 98 converting Breakpoint 1, f2c (f=98) at f2c.c:8 8 return c; (gdb) where #0 f2c (f=98) at f2c.c:8 #1 0x000000000040107e in main (argc=, argv=) at f2c.c:15 ``` ### Call Frame Information The DWARF Call Frame Information provides a flexible way for a compiler to include information for accurate unwinding of the stack. This makes it possible to determine the currently active function calls. I’ve provided a brief introduction in this article, but for more details on how the DWARF implements this mechanism, see the [DWARF specification][4]. -------------------------------------------------------------------------------- via: https://opensource.com/article/23/3/gdb-debugger-call-frame-active-function-calls 作者:[Will Cohen][a] 选题:[lkxed][b] 译者:[译者ID](https://github.com/译者ID) 校对:[校对者ID](https://github.com/校对者ID) 本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出 [a]: https://opensource.com/users/wcohen [b]: https://github.com/lkxed/ [1]: https://opensource.com/article/23/2/compiler-optimization-debugger-line-information [2]: https://www.phoronix.com/review/fedora-frame-pointer [3]: https://opensource.com/article/21/3/debug-code-gdb [4]: https://dwarfstd.org/Download.php