translated

This commit is contained in:
geekpi 2017-08-24 08:50:01 +08:00
parent 950aea8f25
commit a3c640e4ec

View File

@ -1,49 +1,47 @@
translating---geekpi
Writing a Linux Debugger Part 7: Source-level breakpoints
开发一个 Linux 调试器(七):源码层断点
============================================================
Setting breakpoints on memory addresses is all well and good, but it doesnt provide the most user-friendly tool. Wed like to be able to set breakpoints on source lines and function entry addresses as well, so that we can debug at the same abstraction level as our code.
在内存地址上设置断点是可以的,但它没有提供最方便用户的工具。我们希望能够在源代码行和函数入口地址上设置断点,以便我们可以在与代码相同的抽象级中别进行调试。
This post will add source-level breakpoints to our debugger. With all of the support we already have available to us, this is a lot easier than it may first sound. Well also add a command to get the type and address of a symbol, which can be useful for locating code or data and understanding linking concepts.
这篇文章将会添加源码层断点到我们的调试器中。通过所有我们已经支持的,这比起最初听起来容易得多。我们还将添加一个命令来获取符号的类型和地址,这对于定位代码或数据以及理解链接概念非常有用。
* * *
### Series index
### 系列索引
These links will go live as the rest of the posts are released.
随着后面文章的发布,这些链接会逐渐生效。
1. [Setup][1]
1. [准备环境][1]
2. [Breakpoints][2]
2. [断点][2]
3. [Registers and memory][3]
3. [寄存器和内存][3]
4. [Elves and dwarves][4]
4. [Elves dwarves][4]
5. [Source and signals][5]
5. [源码和信号][5]
6. [Source-level stepping][6]
6. [源码层逐步执行][6]
7. [Source-level breakpoints][7]
7. [源码层断点][7]
8. [Stack unwinding][8]
8. [调用栈][8]
9. Reading variables
9. 读取变量
10. Next steps
10. 之后步骤
* * *
### Breakpoints
### 断点
### DWARF
The [Elves and dwarves][9] post described how DWARF debug information works and how it can be used to map the machine code back to the high-level source. Recall that DWARF contains the address ranges of functions and a line table which lets you translate code positions between abstraction levels. Well be using these capabilities to implement our breakpoints.
[Elves 和 dwarves][9] 这篇文章,描述了 DWARF 调试信息是如何工作的以及如何用它来将机器码映射到高层源码中。回想一下DWARF 包含函数的地址范围和一个允许你在抽象层之间转换代码位置的行表。我们将使用这些功能来实现我们的断点。
### Function entry
### 函数入口
Setting breakpoints on function names can be complex if you want to take overloading, member functions and such into account, but were going to iterate through all of the compilation units and search for functions with names which match what were looking for. The DWARF information will look something like this:
如果你考虑重载、成员函数等等那么在函数名上设置断点可能有点复杂但是我们将遍历所有的编译单元并搜索与我们正在寻找的名称匹配的函数。DWARF 信息如下所示:
```
< 0><0x0000000b> DW_TAG_compile_unit
@ -70,7 +68,7 @@ LOCAL_SYMBOLS:
```
We want to match against `DW_AT_name` and use `DW_AT_low_pc`(the start address of the function) to set our breakpoint.
我们想要匹配 `DW_AT_name` 并使用 `DW_AT_low_pc`(函数的起始地址)来设置我们的断点。
```
void debugger::set_breakpoint_at_function(const std::string& name) {
@ -87,13 +85,13 @@ void debugger::set_breakpoint_at_function(const std::string& name) {
}
```
The only bit of that code which looks a bit weird is the `++entry`. The problem is that the `DW_AT_low_pc` for a function doesnt point at the start of the user code for that function, it points to the start of the prologue. The compiler will usually output a prologue and epilogue for a function which carries out saving and restoring registers, manipulating the stack pointer and suchlike. This isnt very useful for us, so we increment the line entry by one to get the first line of the user code instead of the prologue. The DWARF line table actually has some functionality to mark an entry as the first line after the function prologue, but not all compilers output this, so Ive taken the naive approach.
这代码看起来有点奇怪的唯一一点是 `++entry`。 问题是函数的 `DW_AT_low_pc` 不指向该函数的用户代码的起始地址,它指向 prologue 的开始。编译器通常会输出一个函数的 prologue 和 epilogue它们用于执行保存和恢复堆栈、操作堆栈指针等。这对我们来说不是很有用所以我们将入口行加一来获取用户代码的第一行而不是 prologue。DWARF 行表实际上具有一些功能,用于将入口标记为函数 prologue 之后的第一行,但并不是所有编译器都输出该函数,因此我采用了原始的方法。
### Source line
### 源码行
To set a breakpoint on a high-level source line, we translate this line number into an address by looking it up in the DWARF. Well iterate through the compilation units looking for one whose name matches the given file, then look for the entry which corresponds to the given line.
要在高层源码行上设置一个断点,我们要将这个行号转换成 DWARF 中的一个地址。我们将遍历编译单元,寻找一个名称与给定文件匹配的编译单元,然后查找与给定行对应的入口。
The DWARF will look something like this:
DWARF 看山去有点像这样:
```
.debug_line: line number info for a single cu
@ -121,7 +119,7 @@ IS=val ISA number, DI=val discriminator value
```
So if we want to set a breakpoint on line 5 of `ab.cpp`, we look up the entry which corresponds to that line (`0x004004e3`) and set a breakpoint there.
所以如果我们想要在 `ab.cpp` 的第五行设置一个断点,我们查找与行 (`0x004004e3`) 相关的入口并设置一个断点。
```
void debugger::set_breakpoint_at_source_line(const std::string& file, unsigned line) {
@ -140,15 +138,15 @@ void debugger::set_breakpoint_at_source_line(const std::string& file, unsigned l
}
```
My `is_suffix` hack is there so you can type `c.cpp` for `a/b/c.cpp`. Of course you should actually use a sensible path handling library or something; Im lazy. The `entry.is_stmt` is checking that the line table entry is marked as the beginning of a statement, which is set by the compiler on the address it thinks is the best target for a breakpoint.
我这里的 `is_suffix` hack这样你可以为 `a/b/c.cpp` 输入 `c.cpp`。当然你应该使用大小写敏感路径处理库或者其他东西。我很懒。`entry.is_stmt` 是检查行表入口是否被标记为一个语句的开头,这是由编译器根据它认为是断点的最佳目标的地址设置的。
* * *
### Symbol lookup
### 符号查找
When we get down to the level of object files, symbols are king. Functions are named with symbols, global variables are named with symbols, you get a symbol, we get a symbol, everyone gets a symbol. In a given object file, some symbols might reference other object files or shared libraries, where the linker will patch things up to create an executable program from the symbol reference spaghetti.
当我们在对象文件层时,符号是王者。函数用符号命名,全局变量用符号命名,得到一个符号,我们得到一个符号,每个人都得到一个符号。 在给定的对象文件中,一些符号可能引用其他对象文件或共享库,链接器将从符号引用创建一个可执行程序。
Symbols can be looked up in the aptly-named symbol table, which is stored in ELF sections in the binary. Fortunately, `libelfin` has a fairly nice interface for doing this, so we dont need to deal with all of the ELF nonsense ourselves. To give you an idea of what were dealing with, here is a dump of the `.symtab` section of a binary, produced with `readelf`:
可以在正确命名的符号表中查找符号,它存储在二进制文件的 ELF 部分中。幸运的是,`libelfin` 有一个不错的接口来做这件事,所以我们不需要自己处理所有的 ELF 的事情。为了让你知道我们在处理什么,下面是一个二进制文件的 `.symtab` 部分的转储,它由 `readelf` 生成:
```
Num: Value Size Type Bind Vis Ndx Name
@ -222,9 +220,9 @@ Num: Value Size Type Bind Vis Ndx Name
```
You can see lots of symbols for sections in the object file, symbols which are used by the implementation for setting up the environment, and at the end you can see the symbol for `main`.
你可以在对象文件中看到用于设置环境的很多符号,最后还可以看到 `main` 符号。
Were interested in the type, name, and value (address) of the symbol. Well have a `symbol_type` enum for the type and use a `std::string` for the name and `std::uintptr_t` for the address:
我们对符号的类型、名称和值(地址)感兴趣。我们有一个 `symbol_type` 类型的枚举,并使用一个 `std::string` 作为名称,`std::uintptr_t` 作为地址:
```
enum class symbol_type {
@ -252,7 +250,7 @@ struct symbol {
};
```
Well need to map between the symbol type we get from `libelfin` and our enum since we dont want the dependency poisoning this interface. Fortunately I picked the same names for everything, so this is dead easy:
我们需要将从 `libelfin` 获得的符号类型映射到我们的枚举,因为我们不希望依赖关系破环这个接口。幸运的是,我为所有的东西选了同样的名字,所以这样很简单:
```
symbol_type to_symbol_type(elf::stt sym) {
@ -267,7 +265,7 @@ symbol_type to_symbol_type(elf::stt sym) {
};
```
Lastly we want to look up the symbol. For illustrative purposes I loop through the sections of the ELF looking for symbol tables, then collect any symbols I find in them into a `std::vector`. A smarter implementation would build up a map from names to symbols so that you only have to look at all the data once.
最后我们要查找符号。为了说明的目的,我循环查找符号表的 ELF 部分,然后收集我在其中找到的任意符号到 `std::vector` 中。更智能的实现将建立从名称到符号的映射,这样你只需要查看一次数据就行了。
```
std::vector<symbol> debugger::lookup_symbol(const std::string& name) {
@ -291,15 +289,15 @@ std::vector<symbol> debugger::lookup_symbol(const std::string& name) {
* * *
### Adding commands
### 添加命令
As always, we need to add some more commands to expose the functionality to users. For breakpoints Ive gone for a GDB-style interface, where the kind of breakpoint is inferred from the argument you pass rather than requiring explicit switches:
一如往常,我们需要添加一些更多的命令来向用户暴露功能。对于断点,我使用 GDB 风格的接口,其中断点类型是通过你传递的参数推断的,而不用要求显式切换:
* `0x<hexadecimal>` -> address breakpoint
* `0x<hexadecimal>` -> 断点地址
* `<line>:<filename>` -> line number breakpoint
* `<line>:<filename>` -> 断点行号
* `<anything else>` -> function name breakpoint
* `<anything else>` -> 断点函数名
```
else if(is_prefix(command, "break")) {
@ -317,7 +315,7 @@ As always, we need to add some more commands to expose the functionality to user
}
```
For symbols well lookup the symbol and print out any matches we find:
对于符号,我们将查找符号并打印出我们发现的任何匹配项:
```
else if(is_prefix(command, "symbol")) {
@ -330,22 +328,22 @@ else if(is_prefix(command, "symbol")) {
* * *
### Testing it out
### 测试一下
Fire up your debugger on a simple binary, play around with setting source-level breakpoints. Setting a breakpoint on some `foo` and seeing my debugger stop on it was one of the most rewarding moments of this project for me.
在一个简单的二进制文件上启动调试器,并设置源代码级别的断点。在一些 `foo` 上设置一个断点,看到我的调试器停在它上面是我这个项目最有价值的时刻之一。
Symbol lookup can be tested by adding some functions or global variables to your program and looking up the names of them. Note that if youre compiling C++ code youll need to take [name mangling][10] into account as well.
符号查找可以通过在程序中添加一些函数或全局变量并查找它们的名称来进行测试。请注意,如果你正在编译 C++ 代码,你还需要考虑[名称重整][10]。
Thats all for this post. Next time Ill show how to add stack unwinding support to the debugger.
本文就这些了。下一次我将展示如何向调试器添加堆栈展开支持。
You can find the code for this post [here][11].
你可以在[这里][11]找到这篇文章的代码。
--------------------------------------------------------------------------------
via: https://blog.tartanllama.xyz/c++/2017/06/19/writing-a-linux-debugger-source-break/
作者:[Simon Brand ][a]
译者:[译者ID](https://github.com/译者ID)
译者:[geekpi](https://github.com/geekpi)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出