15 KiB
Writing a Linux Debugger Part 9: Handling variables
Variables are sneaky. At one moment they’ll be happily sitting in registers, but as soon as you turn your head they’re spilled to the stack. Maybe the compiler completely throws them out of the window for the sake of optimization. Regardless of how often variables move around in memory, we need some way to track and manipulate them in our debugger. This post will teach you more about handling variables in your debugger and demonstrate a simple implementation using libelfin
.
Series index
Before you get started, make sure that the version of libelfin
you are using is the fbreg
branch of my fork. This contains some hacks to support getting the base of the current stack frame and evaluating location lists, neither of which are supported by vanilla libelfin
. You might need to pass -gdwarf-2
to GCC to get it to generate compatible DWARF information. But before we get into the implementation, I’ll give a more detailed description of how locations are encoded in DWARF 5, which is the most recent specification. If you want more information than what I write here, then you can grab the standard from here.
DWARF locations
The location of a variable in memory at a given moment is encoded in the DWARF information using the DW_AT_location
attribute. Location descriptions can be either single location descriptions, composite location descriptions, or location lists.
-
Simple location descriptions describe the location of one contiguous piece (usually all) of an object. A simple location description may describe a location in addressable memory, or in a register, or the lack of a location (with or without a known value).
- Example:
-
DW_OP_fbreg -32
-
A variable which is entirely stored -32 bytes from the stack frame base
-
- Example:
-
Composite location descriptions describe an object in terms of pieces, each of which may be contained in part of a register or stored in a memory location unrelated to other pieces.
- Example:
-
DW_OP_reg3 DW_OP_piece 4 DW_OP_reg10 DW_OP_piece 2
-
A variable whose first four bytes reside in register 3 and whose next two bytes reside in register 10.
-
- Example:
-
Location lists describe objects which have a limited lifetime or change location during their lifetime.
- Example:
-
<loclist with 3 entries follows>
-
[ 0]<lowpc=0x2e00><highpc=0x2e19>DW_OP_reg0
-
[ 1]<lowpc=0x2e19><highpc=0x2e3f>DW_OP_reg3
-
[ 2]<lowpc=0x2ec4><highpc=0x2ec7>DW_OP_reg2
-
-
A variable whose location moves between registers depending on the current value of the program counter
-
- Example:
The DW_AT_location
is encoded in one of three different ways, depending on the kind of location description. exprloc
s encode simple and composite location descriptions. They consist of a byte length followed by a DWARF expression or location description. loclist
s and loclistptr
s encode location lists. They give indexes or offsets into the .debug_loclists
section, which describes the actual location lists.
DWARF Expressions
The actual location of the variables is computed using DWARF expressions. These consist of a series of operations which operate on a stack of values. There are an impressive number of DWARF operations available, so I won’t explain them all in detail. Instead I’ll give a few examples from each class of expression to give you a taste of what is available. Also, don’t get scared off by these; libelfin
will take care off all of this complexity for us.
-
Literal encodings
-
DW_OP_lit0
,DW_OP_lit1
, …,DW_OP_lit31
- Push the literal value on to the stack
-
DW_OP_addr <addr>
- Pushes the address operand on to the stack
-
DW_OP_constu <unsigned>
- Pushes the unsigned value on to the stack
-
-
Register values
-
DW_OP_fbreg <offset>
- Pushes the value found at the base of the stack frame, offset by the given value
-
DW_OP_breg0
,DW_OP_breg1
, …,DW_OP_breg31 <offset>
- Pushes the contents of the given register plus the given offset to the stack
-
-
Stack operations
-
DW_OP_dup
- Duplicate the value at the top of the stack
-
DW_OP_deref
- Treats the top of the stack as a memory address, and replaces it with the contents of that address
-
-
Arithmetic and logical operations
-
DW_OP_and
- Pops the top two values from the stack and pushes back the logical
AND
of them
- Pops the top two values from the stack and pushes back the logical
-
DW_OP_plus
- Same as
DW_OP_and
, but adds the values
- Same as
-
-
Control flow operations
-
DW_OP_le
,DW_OP_eq
,DW_OP_gt
, etc.- Pops the top two values, compares them, and pushes
1
if the condition is true and0
otherwise
- Pops the top two values, compares them, and pushes
-
DW_OP_bra <offset>
- Conditional branch: if the top of the stack is not
0
, skips back or forward in the expression byoffset
- Conditional branch: if the top of the stack is not
-
-
Type conversions
DW_OP_convert <DIE offset>
- Converts value on the top of the stack to a different type, which is described by the DWARF information entry at the given offset
-
Special operations
DW_OP_nop
- Do nothing!
DWARF types
DWARF’s representation of types needs to be strong enough to give debugger users useful variable representations. Users most often want to be able to debug at the level of their application rather than at the level of their machine, and they need a good idea of what their variables are doing to achieve that.
DWARF types are encoded in DIEs along with the majority of the other debug information. They can have attributes to indicate their name, encoding, size, endianness, etc. A myriad of type tags are available to express pointers, arrays, structures, typedefs, anything else you could see in a C or C++ program.
Take this simple structure as an example:
struct test{
int i;
float j;
int k[42];
test* next;
};
The parent DIE for this struct is this:
< 1><0x0000002a> DW_TAG_structure_type
DW_AT_name "test"
DW_AT_byte_size 0x000000b8
DW_AT_decl_file 0x00000001 test.cpp
DW_AT_decl_line 0x00000001
The above says that we have a structure called test
of size 0xb8
, declared at line 1
of test.cpp
. All there are then many children DIEs which describe the members.
< 2><0x00000032> DW_TAG_member
DW_AT_name "i"
DW_AT_type <0x00000063>
DW_AT_decl_file 0x00000001 test.cpp
DW_AT_decl_line 0x00000002
DW_AT_data_member_location 0
< 2><0x0000003e> DW_TAG_member
DW_AT_name "j"
DW_AT_type <0x0000006a>
DW_AT_decl_file 0x00000001 test.cpp
DW_AT_decl_line 0x00000003
DW_AT_data_member_location 4
< 2><0x0000004a> DW_TAG_member
DW_AT_name "k"
DW_AT_type <0x00000071>
DW_AT_decl_file 0x00000001 test.cpp
DW_AT_decl_line 0x00000004
DW_AT_data_member_location 8
< 2><0x00000056> DW_TAG_member
DW_AT_name "next"
DW_AT_type <0x00000084>
DW_AT_decl_file 0x00000001 test.cpp
DW_AT_decl_line 0x00000005
DW_AT_data_member_location 176(as signed = -80)
Each member has a name, a type (which is a DIE offset), a declaration file and line, and a byte offset into the structure where the member is located. The types which are pointed to come next.
< 1><0x00000063> DW_TAG_base_type
DW_AT_name "int"
DW_AT_encoding DW_ATE_signed
DW_AT_byte_size 0x00000004
< 1><0x0000006a> DW_TAG_base_type
DW_AT_name "float"
DW_AT_encoding DW_ATE_float
DW_AT_byte_size 0x00000004
< 1><0x00000071> DW_TAG_array_type
DW_AT_type <0x00000063>
< 2><0x00000076> DW_TAG_subrange_type
DW_AT_type <0x0000007d>
DW_AT_count 0x0000002a
< 1><0x0000007d> DW_TAG_base_type
DW_AT_name "sizetype"
DW_AT_byte_size 0x00000008
DW_AT_encoding DW_ATE_unsigned
< 1><0x00000084> DW_TAG_pointer_type
DW_AT_type <0x0000002a>
As you can see, int
on my laptop is a 4-byte signed integer type, and float
is a 4-byte float. The integer array type is defined by pointing to the int
type as its element type, a sizetype
(think size_t
) as the index type, with 2a
elements. The test*
type is a DW_TAG_pointer_type
which references the test
DIE.
Implementing a simple variable reader
As mentioned, libelfin
will deal with most of the complexity for us. However, it doesn’t implement all of the different methods for representing variable locations, and handling a lot of them in our code would get pretty complex. As such, I’ve chosen to only support exprloc
s for now. Feel free to add support for more types of expression. If you’re really feeling brave, submit some patches to libelfin
to help complete the necessary support!
Handling variables is mostly down to locating the different parts in memory or registers, then reading or writing is the same as you’ve seen before. I’ll only show you how to implement reading for the sake of simplicity.
First we need to tell libelfin
how to read registers from our process. We do this by creating a class which inherits from expr_context
and uses ptrace
to handle everything:
class ptrace_expr_context : public dwarf::expr_context {
public:
ptrace_expr_context (pid_t pid) : m_pid{pid} {}
dwarf::taddr reg (unsigned regnum) override {
return get_register_value_from_dwarf_register(m_pid, regnum);
}
dwarf::taddr pc() override {
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, m_pid, nullptr, ®s);
return regs.rip;
}
dwarf::taddr deref_size (dwarf::taddr address, unsigned size) override {
//TODO take into account size
return ptrace(PTRACE_PEEKDATA, m_pid, address, nullptr);
}
private:
pid_t m_pid;
};
The reading will be handled by a read_variables
function in our debugger
class:
void debugger::read_variables() {
using namespace dwarf;
auto func = get_function_from_pc(get_pc());
//...
}
The first thing we do above is find the function which we’re currently in. Then we need to loop through the entries in that function, looking for variables:
for (const auto& die : func) {
if (die.tag == DW_TAG::variable) {
//...
}
}
We get the location information by looking up the DW_AT_location
entry in the DIE:
auto loc_val = die[DW_AT::location];
Then we ensure that it’s an exprloc
and ask libelfin
to evaluate the expression for us:
if (loc_val.get_type() == value::type::exprloc) {
ptrace_expr_context context {m_pid};
auto result = loc_val.as_exprloc().evaluate(&context);
Now that we’ve evaluated the expression, we need to read the contents of the variable. It could be in memory or a register, so we’ll handle both cases:
switch (result.location_type) {
case expr_result::type::address:
{
auto value = read_memory(result.value);
std::cout << at_name(die) << " (0x" << std::hex << result.value << ") = "
<< value << std::endl;
break;
}
case expr_result::type::reg:
{
auto value = get_register_value_from_dwarf_register(m_pid, result.value);
std::cout << at_name(die) << " (reg " << result.value << ") = "
<< value << std::endl;
break;
}
default:
throw std::runtime_error{"Unhandled variable location"};
}
As you can see I’ve simply printed out the value without interpreting it based on the type of the variable. Hopefully from this code you can see how you could support writing variables, or searching for variables with a given name.
Finally we can add this to our command parser:
else if(is_prefix(command, "variables")) {
read_variables();
}
Testing it out
Write a few small functions which have some variables, compile it without optimizations and with debug info, then see if you can read the values of your variables. Try writing to the memory address where a variable is stored and see the behaviour of the program change.
Nine posts down, one to go! Next time I’ll be talking about some more advanced concepts which might interest you. For now you can find the code for this post here
via: https://blog.tartanllama.xyz/writing-a-linux-debugger-variables/
作者: Simon Brand 译者:译者ID 校对:校对者ID