From 8a217872589c61d0188feeec981865d312c5d9ae Mon Sep 17 00:00:00 2001 From: Stephen Date: Tue, 3 Jul 2018 22:29:00 +0800 Subject: [PATCH] translating completed for the first pass translating completed for the first pass --- ...get a core dump for a segfault on Linux.md | 70 +++++++++++++++++++ 1 file changed, 70 insertions(+) diff --git a/sources/tech/20180428 How to get a core dump for a segfault on Linux.md b/sources/tech/20180428 How to get a core dump for a segfault on Linux.md index d3006d1d6d..20f224b918 100644 --- a/sources/tech/20180428 How to get a core dump for a segfault on Linux.md +++ b/sources/tech/20180428 How to get a core dump for a segfault on Linux.md @@ -152,8 +152,12 @@ It’s important to know that `kernel.core_pattern` is a global settings – i ### kernel.core_pattern & Ubuntu +### kernel.core_pattern 和 Ubuntu + By default on Ubuntu systems, this is what `kernel.core_pattern` is set to +默认情况下在ubuntu系统中,`kernel.core_pattern` 被设置为下面的值 + ``` $ sysctl kernel.core_pattern kernel.core_pattern = |/usr/share/apport/apport %p %s %c %d %P @@ -162,26 +166,48 @@ kernel.core_pattern = |/usr/share/apport/apport %p %s %c %d %P This caused me a lot of confusion (what is this apport thing and what is it doing with my core dumps??) so here’s what I learned about this: +这引起了我的迷惑(这apport是干什么的,它对我的核心转储做了什么?)。这是关于这个我了解到的: + * Ubuntu uses a system called “apport” to report crashes in apt packages +* Ubuntu 使用一种叫做 apport 的系统来报告apt包有关的崩溃信息。 + * Setting `kernel.core_pattern=|/usr/share/apport/apport %p %s %c %d %P`means that core dumps will be piped to `apport` +* 把 `kernel.core_pattern=|/usr/share/apport/apport %p %s %c %d %P` 意味着核心转储将被通过管道送给 `apport` 程序。 + * apport has logs in /var/log/apport.log +* apport 的日志保存在文件 /var/log/apport.log 中。 + * apport by default will ignore crashes from binaries that aren’t part of an Ubuntu packages +* apport 默认会忽略来自不属于Ubuntu软件包一部分的二进制文件的崩溃信息 + I ended up just overriding this Apport business and setting `kernel.core_pattern` to `sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t` because I was on a dev machine, I didn’t care whether Apport was working on not, and I didn’t feel like trying to convince Apport to give me my core dumps. +我最终只是覆盖了 apport,并使用 `sysctl -w kernel.core_pattern=/tmp/core-%e.%p.%h.%t` 重新设置了 `kernel.core_pattern`,因为我在一台开发机上,我不在乎 apport 是否工作,我也不想尝试让 apport 把我的核心转储留在磁盘上。 + ### So you have a core dump. Now what? +### 现在你有了核心转储,接下来干什么? + Okay, now we know about ulimits and `kernel.core_pattern` and you have actually have a core dump file on disk in `/tmp`. Amazing! Now what??? We still don’t know why the program segfaulted! +好的,现在我们了解了 ulimit 和 `kernel.core_pattern` ,并且实际上在磁盘的 `/tmp` 目录中有了一个核心转储文件。太好了!接下来干什么?我们仍然不知道该程序为什么会出现段错误! + The next step is to open the core file with `gdb` and get a backtrace. +下一步将使用 `gdb` 打开核心转储文件并获取堆栈调用序列。 + ### Getting a backtrace from gdb +### 从gdb中得到堆栈调用序列 + You can open a core file with gdb like this: +你可以像这样用 `gdb` 打开一个核心转储文件 + ``` $ gdb -c my_core_file @@ -189,8 +215,12 @@ $ gdb -c my_core_file Next, we want to know what the stack was when the program crashed. Running `bt` at the gdb prompt will give you a backtrace. In my case gdb hadn’t loaded symbols for the binary, so it was just like `??????`. Luckily, loading symbols fixed it. +接下来,我们想知道程序崩溃时的堆栈是什么样的。在 gdb 提示符下运行 `bt` 会给你一个调用序列。在我的例子里,gdb 没有为二进制文件加载符号信息,所以这些函数名就像“??????”。幸运的是,(我们通过)加载符号修复了它。 + Here’s how to load debugging symbols. +下面是如何加载调试符号。 + ``` symbol-file /path/to/my/binary sharedlibrary @@ -199,12 +229,20 @@ sharedlibrary This loads symbols from the binary and from any shared libraries the binary uses. Once I did that, gdb gave me a beautiful stack trace with line numbers when I ran `bt`!!! +这从二进制文件及其引用的任何共享库中加载符号。一旦我这样做了,当我执行 `bt` 时,gdb 给了我一个带有行号的漂亮的堆栈跟踪! + If you want this to work, the binary should be compiled with debugging symbols. Having line numbers in your stack traces is extremely helpful when trying to figure out why a program crashed :) +如果你想它能工作,二进制文件应该以带有调试符号信息的方式被编译。在试图找出程序崩溃的原因时,堆栈跟踪中的行号非常有帮助。:) + ### look at the stack for every thread +### 查看每个线程的堆栈 + Here’s how to get the stack for every thread in gdb! +通过以下方式在 gdb 中获取每个线程的调用栈! + ``` thread apply all bt full @@ -212,36 +250,68 @@ thread apply all bt full ### gdb + core dumps = amazing +### gdb + 核心转储 = 惊喜(这两段需要再斟酌) + If you have a core dump & debugging symbols and gdb, you are in an amazing situation!! You can go up and down the call stack, print out variables, and poke around in memory to see what happened. It’s the best. +如果你有一个带调试符号的核心转储以及 gdb,那太棒了!您可以上下查看调用堆栈,打印变量,并查看内存来得知发生了什么。这是最好的。 + If you are still working on being a gdb wizard, you can also just print out the stack trace with `bt` and that’s okay :) +如果您仍然正在基于gdb向导来工作上,只打印出栈跟踪与bt也可以:) + +### ASAN + ### ASAN Another path to figuring out your segfault is to do one compile the program with AddressSanitizer (“ASAN”) (`$CC -fsanitize=address`) and run it. I’m not going to discuss that in this post because this is already pretty long and anyway in my case the segfault disappeared with ASAN turned on for some reason, possibly because the ASAN build used a different memory allocator (system malloc instead of tcmalloc). +另一种搞清楚您的段错误的方法是使用 AddressSanitizer 选项编译程序(“ASAN”,即 `$CC -fsanitize=address`)然后运行它。 本文中我不准备讨论那个,因为本文已经相当长了,并且在我的例子中打开 ASAN 后段错误消失了,可能是因为 ASAN 使用了一个不同的内存分配器(系统内存分配器,而不是tcmalloc)。 + I might write about ASAN more in the future if I ever get it to work :) +在未来如果我能让ASAN工作,我可能会多写点有关它的东西。(译者注:这里指使用ASAN也能出现段错误) + ### getting a stack trace from a core dump is pretty approachable! +### 从一个核心转储得到一个堆栈跟踪是很approachable! + This blog post sounds like a lot and I was pretty confused when I was doing it but really there aren’t all that many steps to getting a stack trace out of a segfaulting program: +这个博客后听起来很多,但当我做这些的时候很困惑,但说真的,从一个段错误的程序中获得一个堆栈调用序列不需要那么多步骤: + 1. try valgrind +1. 试试用valgrind + if that doesn’t work, or if you want to have a core dump to investigate: +如果那没用,或者你想要拿到一个核心转储来调查: + 1. make sure the binary is compiled with debugging symbols +1. 确保二进制文件编译时带有调试符号信息; + 2. set `ulimit` and `kernel.core_pattern` correctly +2. 正确的设置 `ulimit` 和 `kernel.core_pattern`; + 3. run the program +3. 运行程序; + 4. open your core dump with `gdb`, load the symbols, and run `bt` +4. 一旦你用 `gdb` 调试核心转储了,加载符号并运行 `bt`; + 5. try to figure out what happened!! +5. 尝试找出发生了什么! + I was able using gdb to figure out that there was a C++ vtable entry that is pointing to some corrupt memory, which was somewhat helpful and helped me feel like I understood C++ a bit better. Maybe we’ll talk more about how to use gdb to figure things out another day! +我可以使用 gdb 弄清楚有一个 C++ 的 vtable 条目指向一些被破坏的内存,这有点帮助,并且使我感觉好像理解 C++ 更好一点。也许有一天我们会更多地讨论如何使用 gdb 来查找问题! + -------------------------------------------------------------------------------- via: https://jvns.ca/blog/2018/04/28/debugging-a-segfault-on-linux/