TranslateProject/sources/tech/20180109 Profiler adventures resolving symbol addresses is hard.md
2018-03-04 16:38:21 +08:00

164 lines
11 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Profiler adventures: resolving symbol addresses is hard!
============================================================
The other day I posted [How does gdb call functions?][1]. In that post I said:
> Using the symbol table to figure out the address of the function you want to call is pretty straightforward
Unsurprisingly, it turns out that figuring out the address in memory corresponding to a given symbol is actually not really that straightforward. This is actually something Ive been doing in my profiler, and I think its interesting, so I thought Id write about it!
Basically the problem Ive been trying to solve is I have a symbol (like `ruby_api_version`), and I want to figure out which address that symbol is mapped to in my target processs memory (so that I can get the data in it, like the Ruby processs Ruby version). So far Ive run into (and fixed!) 3 issues when trying to do this:
1. When binaries are loaded into memory, theyre loaded at a random address (so I cant just read the symbol table)
2. The symbol I want isnt necessary in the “main” binary (`/proc/PID/exe`, sometimes its in some other dynamically linked library)
3. I need to look at the ELF program header to adjust which address I look at for the symbol
Ill start with some background, and then explain these 3 things! (I actually dont know what gdb does)
### whats a symbol?
Most binaries have functions and variables in them. For instance, Perl has a global variable called `PL_bincompat_options` and a function called `Perl_sv_catpv_mg`.
Sometimes binaries need to look up functions from another binary (for example, if the binary is a dynamically linked library, you need to look up its functions by name). Also sometimes youre debugging your code and you want to know what function an address corresponds to.
Symbols are how you look up functions / variables in a binary. Theyre in a section called the “symbol table”. The symbol table is basically an index for your binary! Sometimes theyre missing (“stripped”). There are a lot of binary formats, but this post is just about the usual binary format on Linux: ELF.
### how do you get the symbol table of a binary?
A thing that I learned today (or at least learned and then forgot) is that there are 2 possible sections symbols can live in: `.symtab` and `.dynsym`. `.dynsym` is the “dynamic symbol table”. According to [this page][2], the dynsym is a smaller version of the symtab that only contains global symbols.
There are at least 3 ways to read the symbol table of a binary on Linux: you can use nm, objdump, or readelf.
* **read the .symtab**: `nm $FILE`, `objdump --syms $FILE`, `readelf -a $FILE`
* **read the .dynsym**: `nm -D $FILE`, `objdump --dynamic-syms $FILE`, `readelf -a $FILE`
`readelf -a` is the same in both cases because `readelf -a` just shows you everything in an ELF file. Its my favorite because I dont need to guess where the information I want is, I can just print out everything and then use grep.
Heres an example of some of the symbols in `/usr/bin/perl`. You can see that each symbol has a **name**, a **value**, and a **type**. The value is basically the offset of the code/data corresponding to that symbol in the binary. (except some symbols have value 0\. I think that has something to do with dynamic linking but I dont understand it so were not going to get into it)
```
$ readelf -a /usr/bin/perl
...
Num: Value Size Type Ndx Name
523: 00000000004d6590 49 FUNC 14 Perl_sv_catpv_mg
524: 0000000000543410 7 FUNC 14 Perl_sv_copypv
525: 00000000005a43e0 202 OBJECT 16 PL_bincompat_options
526: 00000000004e6d20 2427 FUNC 14 Perl_pp_ucfirst
527: 000000000044a8c0 1561 FUNC 14 Perl_Gv_AMupdate
...
```
### the question we want to answer: what address is a symbol mapped to?
Thats enough background!
Now suppose Im a debugger, and I want to know what address the `ruby_api_version` symbol is mapped to. Lets use readelf to look at the relevant Ruby binary!
```
readelf -a ~/.rbenv/versions/2.1.6/bin/ruby | grep ruby_api_version
365: 00000000001f9180 12 OBJECT GLOBAL DEFAULT 15 ruby_api_version
```
Neat! The offset of `ruby_api_version` is `0x1f9180`. Were done, right? Of course not! :)
### Problem 1: ASLR (Address space layout randomization)
Heres the first issue: when Linux loads a binary into memory (like `~/.rbenv/versions/2.1.6/bin/ruby`), it doesnt just load it at the `0` address. Instead, it usually adds a random offset. Wikipedias article on ASLR explains why:
> Address space layout randomization (ASLR) is a memory-protection process for operating systems (OSes) that guards against buffer-overflow attacks by randomizing the location where system executables are loaded into memory.
We can see this happening in practice: I started `/home/bork/.rbenv/versions/2.1.6/bin/ruby` 3 times and every time the process gets mapped to a different place in memory. (`0x56121c86f000`, `0x55f440b43000`, `0x56163334a000`)
Here were meeting our good friend `/proc/$PID/maps`  this file contains a list of memory maps for a process. The memory maps tell us every address range in the processs virtual memory (it turns out virtual memory isnt contiguous! Instead process get a bunch of possibly-disjoint memory maps!). This file is so useful! You can find the address of the stack, the heap, every dynamically loaded library, anonymous memory maps, and probably more.
```
$ cat /proc/(pgrep -f 2.1.6)/maps | grep 'bin/ruby'
56121c86f000-56121caf0000 r-xp 00000000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
56121ccf0000-56121ccf5000 r--p 00281000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
56121ccf5000-56121ccf7000 rw-p 00286000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
$ cat /proc/(pgrep -f 2.1.6)/maps | grep 'bin/ruby'
55f440b43000-55f440dc4000 r-xp 00000000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
55f440fc4000-55f440fc9000 r--p 00281000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
55f440fc9000-55f440fcb000 rw-p 00286000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
$ cat /proc/(pgrep -f 2.1.6)/maps | grep 'bin/ruby'
56163334a000-5616335cb000 r-xp 00000000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
5616337cb000-5616337d0000 r--p 00281000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
5616337d0000-5616337d2000 rw-p 00286000 00:32 323508 /home/bork/.rbenv/versions/2.1.6/bin/ruby
```
Okay, so in the last example we see that our binary is mapped at `0x56163334a000`. If we combine this with the knowledge that `ruby_api_version` is at `0x1f9180`, then that means that we just need to look that the address `0x1f9180 + 0x56163334a000` to find our variable, right?
Yes! In this case, that works. But in other cases it wont! So that brings us to problem 2.
### Problem 2: dynamically loaded libraries
Next up, I tried running system Ruby: `/usr/bin/ruby`. This binary has basically no symbols at all! Disaster! In particular it does not have a `ruby_api_version`symbol.
But when I tried to print the `ruby_api_version` variable with gdb, it worked!!! Where was gdb finding my symbol? I found the answer with the help of our good friend: `/proc/PID/maps`
It turns out that `/usr/bin/ruby` dynamically loads a library called `libruby-2.3`. You can see it in the memory maps here:
```
$ cat /proc/(pgrep -f /usr/bin/ruby)/maps | grep libruby
7f2c5d789000-7f2c5d9f1000 r-xp 00000000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
7f2c5d9f1000-7f2c5dbf0000 ---p 00268000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
7f2c5dbf0000-7f2c5dbf6000 r--p 00267000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
7f2c5dbf6000-7f2c5dbf7000 rw-p 0026d000 00:14 /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0
```
And if we read it with `readelf`, we find the address of that symbol!
```
readelf -a /usr/lib/x86_64-linux-gnu/libruby-2.3.so.2.3.0 | grep ruby_api_version
374: 00000000001c72f0 12 OBJECT GLOBAL DEFAULT 13 ruby_api_version
```
So in this case the address of the symbol we want is `0x7f2c5d789000` (the start of the libruby-2.3 memory map) plus `0x1c72f0`. Nice! But were still not done. There is (at least) one more mystery!
### Problem 3: the `vaddr` offset in the ELF program header
This one I just figured out today so its the one I have the shakiest understanding of. Heres what happened.
I was running system ruby on Ubuntu 14.04: Ruby 1.9.3\. And my usual code (find the libruby map, get its address, get the symbol offset, add them up) wasnt working!!! I was confused.
But Id asked Julian if he knew of any weird stuff I need to worry about a while back and he said “well, you should read the code for `dlsym`, youre trying to do basically the same thing”. So I decided to, instead of randomly guessing, go read the code for `dlsym`.
The man page for `dlsym` says “dlsym, dlvsym - obtain address of a symbol in a shared object or executable”. Perfect!!
[Heres the dlsym code from musl I read][3]. (musl is like glibc, but, different. Maybe easier to read? I dont understand it that well.)
The dlsym code says (on line 1468) `return def.dso->base + def.sym->st_value;` That sounds like what Im doing!! But whats `dso->base`? It looks like `base = map - addr_min;`, and `addr_min = ph->p_vaddr;`. (theres also some stuff that makes sure `addr_min` is aligned with the page size which I should maybe pay attention to.)
So the code I want is something like `map_base - ph->p_vaddr + sym->st_value`.
I looked up this `vaddr` thing in the ELF program header, subtracted it from my calculation, and voilà! It worked!!!
### there are probably more problems!
I imagine I will discover even more ways that I am calculating the symbol address wrong. Its interesting that such a seemingly simple thing (“whats the address of this symbol?”) is so complicated!
It would be nice to be able to just call `dlsym` and have it do all the right calculations for me, but I think I cant because the symbol is in a different process. Maybe Im wrong about that though! I would like to be wrong about that. If you know an easier way to do all this I would very much like to know!
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2018/01/09/resolving-symbol-addresses/
作者:[Julia Evans ][a]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]:https://jvns.ca
[1]:https://jvns.ca/blog/2018/01/04/how-does-gdb-call-functions/
[2]:https://blogs.oracle.com/ali/inside-elf-symbol-tables
[3]:https://github.com/esmil/musl/blob/194f9cf93da8ae62491b7386edf481ea8565ae4e/src/ldso/dynlink.c#L1451