mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
translated.1
This commit is contained in:
parent
ba26825454
commit
81e5190e02
@ -1,347 +0,0 @@
|
|||||||
[#]: collector: (lujun9972)
|
|
||||||
[#]: translator: (hanwckf)
|
|
||||||
[#]: reviewer: ( )
|
|
||||||
[#]: publisher: ( )
|
|
||||||
[#]: url: ( )
|
|
||||||
[#]: subject: (Debugging Software Deployments with strace)
|
|
||||||
[#]: via: (https://theartofmachinery.com/2019/11/14/deployment_debugging_strace.html)
|
|
||||||
[#]: author: (Simon Arneaud https://theartofmachinery.com)
|
|
||||||
|
|
||||||
Debugging Software Deployments with strace
|
|
||||||
======
|
|
||||||
|
|
||||||
Most of my paid work involves deploying software systems, which means I spend a lot of time trying to answer the following questions:
|
|
||||||
|
|
||||||
* This software works on the original developer’s machine, so why doesn’t it work on mine?
|
|
||||||
* This software worked on my machine yesterday, so why doesn’t it work today?
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
That’s a kind of debugging, but it’s a different kind of debugging from normal software debugging. Normal debugging is usually about the logic of the code, but deployment debugging is usually about the interaction between the code and its environment. Even when the root cause is a logic bug, the fact that the software apparently worked on another machine means that the environment is usually involved somehow.
|
|
||||||
|
|
||||||
So, instead of using normal debugging tools like `gdb`, I have another toolset for debugging deployments. My favourite tool for “Why isn’t this software working on this machine?” is `strace`.
|
|
||||||
|
|
||||||
### What is `strace`?
|
|
||||||
|
|
||||||
[`strace`][1] is a tool for “system call tracing”. It’s primarily a Linux tool, but you can do the same kind of debugging tricks with tools for other systems (such as [DTrace][2] and [ktrace][3]).
|
|
||||||
|
|
||||||
The basic usage is very simple. Just run it against a command and it dumps all the system calls (you’ll probably need to install `strace` first):
|
|
||||||
|
|
||||||
```
|
|
||||||
$ strace echo Hello
|
|
||||||
...Snip lots of stuff...
|
|
||||||
write(1, "Hello\n", 6) = 6
|
|
||||||
close(1) = 0
|
|
||||||
close(2) = 0
|
|
||||||
exit_group(0) = ?
|
|
||||||
+++ exited with 0 +++
|
|
||||||
```
|
|
||||||
|
|
||||||
What are these system calls? They’re like the API for the operating system kernel. Once upon a time, software used to have direct access to the hardware it ran on. If it needed to display something on the screen, for example, it could twiddle with ports and/or memory-mapped registers for the video hardware. That got chaotic when multitasking computer systems became popular because different applications would “fight” over hardware, and bugs in one application could crash other applications, or even bring down the whole system. So CPUs started supporting different privilege modes (or “protection rings”). They let an operating system kernel run in the most privileged mode with full hardware access, while spawning less-privileged software applications that must ask the kernel to interact with the hardware for them using system calls.
|
|
||||||
|
|
||||||
At the binary level, making a system call is a bit different from making a simple function call, but most programs use wrappers in a standard library. E.g. the POSIX C standard library contains a `write()` function call that contains all the architecture-dependent code for making the `write` system call.
|
|
||||||
|
|
||||||
![][4]
|
|
||||||
|
|
||||||
In short, an application’s interaction with its environment (the computer system) is all done through system calls. So when software works on one machine but not another, looking at system call traces is a good way to find what’s wrong. More specifically, here are the typical things you can analyse using a system call trace:
|
|
||||||
|
|
||||||
* Console input and output (IO)
|
|
||||||
* Network IO
|
|
||||||
* Filesystem access and file IO
|
|
||||||
* Process/thread lifetime management
|
|
||||||
* Raw memory management
|
|
||||||
* Access to special device drivers
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### When can `strace` be used?
|
|
||||||
|
|
||||||
In theory, `strace` can be used with any userspace program because all userspace programs have to make system calls. It’s more effective with compiled, lower-level programs, but still works with high-level languages like Python if you can wade through the extra noise from the runtime environment and interpreter.
|
|
||||||
|
|
||||||
`strace` shines with debugging software that works fine on one machine, but on another machine fails with a vague error message about files or permissions or failure to run some command or something. Unfortunately, it’s not so great with higher-level problems, like a certificate verification failure. They usually need a combination of `strace`, sometimes [`ltrace`][5], and higher-level tooling (like the `openssl` command line tool for certificate debugging).
|
|
||||||
|
|
||||||
The examples in this post are based on a standalone server, but system call tracing can often be done on more complicated deployment platforms, too. Just search for appropriate tooling.
|
|
||||||
|
|
||||||
### A simple debugging example
|
|
||||||
|
|
||||||
Let’s say you’re trying to run an awesome server application called foo, but here’s what happens:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ foo
|
|
||||||
Error opening configuration file: No such file or directory
|
|
||||||
```
|
|
||||||
|
|
||||||
Obviously it’s not finding the configuration file that you’ve written. This can happen because package managers sometimes customise the expected locations of files when compiling an application, so following an installation guide for one distro leads to files in the wrong place on another distro. You could fix the problem in a few seconds if only the error message told you where the configuration file is expected to be, but it doesn’t. How can you find out?
|
|
||||||
|
|
||||||
If you have access to the source code, you could read it and work it out. That’s a good fallback plan, but not the fastest solution. You also could use a stepping debugger like `gdb` to see what the program does, but it’s more efficient to use a tool that’s specifically designed to show the interaction with the environment: `strace`.
|
|
||||||
|
|
||||||
The output of `strace` can be a bit overwhelming at first, but the good news is that you can ignore most of it. It often helps to use the `-o` switch to save the trace to a separate file:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ strace -o /tmp/trace foo
|
|
||||||
Error opening configuration file: No such file or directory
|
|
||||||
$ cat /tmp/trace
|
|
||||||
execve("foo", ["foo"], 0x7ffce98dc010 /* 16 vars */) = 0
|
|
||||||
brk(NULL) = 0x56363b3fb000
|
|
||||||
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
|
|
||||||
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
|
|
||||||
fstat(3, {st_mode=S_IFREG|0644, st_size=25186, ...}) = 0
|
|
||||||
mmap(NULL, 25186, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f2f12cf1000
|
|
||||||
close(3) = 0
|
|
||||||
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
|
|
||||||
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832
|
|
||||||
fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0
|
|
||||||
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2f12cef000
|
|
||||||
mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2f12b2e000
|
|
||||||
mprotect(0x7f2f12b50000, 1658880, PROT_NONE) = 0
|
|
||||||
mmap(0x7f2f12b50000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f2f12b50000
|
|
||||||
mmap(0x7f2f12c98000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7f2f12c98000
|
|
||||||
mmap(0x7f2f12ce5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7f2f12ce5000
|
|
||||||
mmap(0x7f2f12ceb000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2f12ceb000
|
|
||||||
close(3) = 0
|
|
||||||
arch_prctl(ARCH_SET_FS, 0x7f2f12cf0500) = 0
|
|
||||||
mprotect(0x7f2f12ce5000, 16384, PROT_READ) = 0
|
|
||||||
mprotect(0x56363b08b000, 4096, PROT_READ) = 0
|
|
||||||
mprotect(0x7f2f12d1f000, 4096, PROT_READ) = 0
|
|
||||||
munmap(0x7f2f12cf1000, 25186) = 0
|
|
||||||
openat(AT_FDCWD, "/etc/foo/config.json", O_RDONLY) = -1 ENOENT (No such file or directory)
|
|
||||||
dup(2) = 3
|
|
||||||
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
|
|
||||||
brk(NULL) = 0x56363b3fb000
|
|
||||||
brk(0x56363b41c000) = 0x56363b41c000
|
|
||||||
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x8), ...}) = 0
|
|
||||||
write(3, "Error opening configuration file"..., 60) = 60
|
|
||||||
close(3) = 0
|
|
||||||
exit_group(1) = ?
|
|
||||||
+++ exited with 1 +++
|
|
||||||
```
|
|
||||||
|
|
||||||
The first page or so of `strace` output is typically low-level process startup. (You can see a lot of `mmap`, `mprotect`, `brk` calls for things like allocating raw memory and mapping dynamic libraries.) Actually, when debugging an error, `strace` output is best read from the bottom up. You can see the `write` call that outputs the error message at the end. If you work up, the first failing system call is the `openat` call that fails with `ENOENT` (“No such file or directory”) trying to open `/etc/foo/config.json`. And now we know where the configuration file is supposed to be.
|
|
||||||
|
|
||||||
That’s a simple example, but I’d say at least 90% of the time I use `strace`, I’m not doing anything more complicated. Here’s the complete debugging formula step-by-step:
|
|
||||||
|
|
||||||
1. Get frustrated by a vague system-y error message from a program
|
|
||||||
2. Run the program again with `strace`
|
|
||||||
3. Find the error message in the trace
|
|
||||||
4. Work upwards to find the first failing system call
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
There’s a very good chance the system call in step 4 shows you what went wrong.
|
|
||||||
|
|
||||||
### Some tips
|
|
||||||
|
|
||||||
Before walking through a more complicated example, here are some useful tips for using `strace` effectively:
|
|
||||||
|
|
||||||
#### `man` is your friend
|
|
||||||
|
|
||||||
On many *nix systems, you can get a list of all kernel system calls by running `man syscalls`. You’ll see things like `brk(2)`, which means you can get more information by running `man 2 brk`.
|
|
||||||
|
|
||||||
One little gotcha: `man 2 fork` shows me the man page for the `fork()` wrapper in GNU `libc`, which is actually now implemented using the `clone` system call instead. The semantics of `fork` are the same, but if I write a program using `fork()` and `strace` it, I won’t find any `fork` calls in the trace, just `clone` calls. Gotchas like that are only confusing if you’re comparing source code to `strace` output.
|
|
||||||
|
|
||||||
#### Use `-o` to save output to a file
|
|
||||||
|
|
||||||
`strace` can generate a lot of output so it’s often helpful to store the trace in a separate file (as in the example above). It also avoids mixing up program output with `strace` output in the console.
|
|
||||||
|
|
||||||
#### Use `-s` to see more argument data
|
|
||||||
|
|
||||||
You might have noticed that the second part of the error message doesn’t appear in the example trace above. That’s because `strace` only shows the first 32 bytes of string arguments by default. If you need to capture more, add something like `-s 128` to the `strace` invocation.
|
|
||||||
|
|
||||||
#### `-y` makes it easier to track files/sockets/etc
|
|
||||||
|
|
||||||
“Everything is a file” means *nix systems do all IO using file descriptors, whether it’s to an actual file or over networks or through interprocess pipes. That’s convenient for programming, but makes it harder to follow what’s really going on when you see generic `read` and `write` in the system call trace.
|
|
||||||
|
|
||||||
Adding the `-y` switch makes `strace` annotate every file descriptor in the output with a note about what it points to.
|
|
||||||
|
|
||||||
#### Attach to an already-running process with `-p`
|
|
||||||
|
|
||||||
As we’ll see in the example later, sometimes you want to trace a program that’s already running. If you know it’s running as process 1337 (say, by looking at the output of `ps`), you can trace it like this:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ strace -p 1337
|
|
||||||
...system call trace output...
|
|
||||||
```
|
|
||||||
|
|
||||||
You probably need root.
|
|
||||||
|
|
||||||
#### Use `-f` to follow child processes
|
|
||||||
|
|
||||||
By default, `strace` only traces the one process. If that process spawns a child process, you’ll see the system call for spawning the process (normally `clone` nowadays), but not any of the calls made by the child process.
|
|
||||||
|
|
||||||
If you think the bug is in a child process, you’ll need to use the `-f` switch to enable tracing it. A downside is that the output can be more confusing. When tracing one process and one thread, `strace` can show you a single stream of call events. When tracing multiple processes, you might see the start of a call cut off with `<unfinished ...>`, then a bunch of calls for other threads of execution, before seeing the end of the original call with `<... foocall resumed>`. Alternatively, you can separate all the traces into different files by using the `-ff` switch as well (see [the `strace` manual][6] for details).
|
|
||||||
|
|
||||||
#### You can filter the trace with `-e`
|
|
||||||
|
|
||||||
As you’ve seen, the default trace output is a firehose of all system calls. You can filter which calls get traced using the `-e` flag (see [the `strace` manual][6]). The main advantage is that it’s faster to run the program under a filtered `strace` than to trace everything and `grep` the results later. Honestly, I don’t bother most of the time.
|
|
||||||
|
|
||||||
#### Not all errors are bad
|
|
||||||
|
|
||||||
A simple and common example is a program searching for a file in multiple places, like a shell searching for which `bin/` directory has an executable:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ strace sh -c uname
|
|
||||||
...
|
|
||||||
stat("/home/user/bin/uname", 0x7ffceb817820) = -1 ENOENT (No such file or directory)
|
|
||||||
stat("/usr/local/bin/uname", 0x7ffceb817820) = -1 ENOENT (No such file or directory)
|
|
||||||
stat("/usr/bin/uname", {st_mode=S_IFREG|0755, st_size=39584, ...}) = 0
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
The “last failed call before the error message” heuristic is pretty good at finding relevent errors. In any case, working from the bottom up makes sense.
|
|
||||||
|
|
||||||
#### C programming guides are good for understanding system calls
|
|
||||||
|
|
||||||
Standard C library calls aren’t system calls, but they’re only thin layers on top. So if you understand (even just roughly) how to do something in C, it’s easier to read a system call trace. For example, if you’re having trouble debugging networking system calls, you could try skimming through [Beej’s classic Guide to Network Programming][7].
|
|
||||||
|
|
||||||
### A more complicated debugging example
|
|
||||||
|
|
||||||
As I said, that simple debugging example is representative of most of my `strace` usage. However, sometimes a little more detective work is required, so here’s a slightly more complicated (and real) example.
|
|
||||||
|
|
||||||
[`bcron`][8] is a job scheduler that’s yet another implementation of the classic *nix `cron` daemon. It’s been installed on a server, but here’s what happens when someone tries to edit a job schedule:
|
|
||||||
|
|
||||||
```
|
|
||||||
# crontab -e -u logs
|
|
||||||
bcrontab: Fatal: Could not create temporary file
|
|
||||||
```
|
|
||||||
|
|
||||||
Okay, so bcron tried to write some file, but it couldn’t, and isn’t telling us why. This is a debugging job for `strace`:
|
|
||||||
|
|
||||||
```
|
|
||||||
# strace -o /tmp/trace crontab -e -u logs
|
|
||||||
bcrontab: Fatal: Could not create temporary file
|
|
||||||
# cat /tmp/trace
|
|
||||||
...
|
|
||||||
openat(AT_FDCWD, "bcrontab.14779.1573691864.847933", O_RDONLY) = 3
|
|
||||||
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f82049b4000
|
|
||||||
read(3, "#Ansible: logsagg\n20 14 * * * lo"..., 8192) = 150
|
|
||||||
read(3, "", 8192) = 0
|
|
||||||
munmap(0x7f82049b4000, 8192) = 0
|
|
||||||
close(3) = 0
|
|
||||||
socket(AF_UNIX, SOCK_STREAM, 0) = 3
|
|
||||||
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/bcron-spool"}, 110) = 0
|
|
||||||
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f82049b4000
|
|
||||||
write(3, "156:Slogs\0#Ansible: logsagg\n20 1"..., 161) = 161
|
|
||||||
read(3, "32:ZCould not create temporary f"..., 8192) = 36
|
|
||||||
munmap(0x7f82049b4000, 8192) = 0
|
|
||||||
close(3) = 0
|
|
||||||
write(2, "bcrontab: Fatal: Could not creat"..., 49) = 49
|
|
||||||
unlink("bcrontab.14779.1573691864.847933") = 0
|
|
||||||
exit_group(111) = ?
|
|
||||||
+++ exited with 111 +++
|
|
||||||
```
|
|
||||||
|
|
||||||
There’s the error message `write` near the end, but a couple of things are different this time. First, there’s no relevant system call error that happens before it. Second, we see that the error message has just been `read` from somewhere else. It looks like the real problem is happening somewhere else, and `bcrontab` is just replaying the message.
|
|
||||||
|
|
||||||
If you look at `man 2 read`, you’ll see that the first argument (the 3) is a file descriptor, which is what *nix uses for all IO handles. How do you know what file descriptor 3 represents? In this specific case, you could run `strace` with the `-y` switch (as explained above) and it would tell you automatically, but it’s useful to know how to read and analyse traces to figure things like this out.
|
|
||||||
|
|
||||||
A file descriptor can come from one of many system calls (depending on whether it’s a descriptor for the console, a network socket, an actual file, or something else), but in any case we can search for calls returning 3 (i.e., search for “= 3” in the trace). There are two in this trace: the `openat` at the top, and the `socket` in the middle. `openat` opens a file, but the `close(3)` afterwards shows that it gets closed again. (Gotcha: file descriptors can be reused as they’re opened and closed.) The `socket` call is the relevant one (it’s the last one before the `read`), which tells us `bcrontab` is talking to something over a network socket. The next line, `connect` shows file descriptor 3 being configured as a Unix domain socket connection to `/var/run/bcron-spool`.
|
|
||||||
|
|
||||||
So now we need to figure out what’s listening on the other side of the Unix socket. There are a couple of neat tricks for that, both useful for debugging server deployments. One is to use `netstat` or the newer `ss` (“socket status”). Both commands describe active network sockets on the system, and take the `-l` switch for describing listening (server) sockets, and the `-p` switch to get information about what program is using the socket. (There are many more useful options, but those two are enough to get this job done.)
|
|
||||||
|
|
||||||
```
|
|
||||||
# ss -pl | grep /var/run/bcron-spool
|
|
||||||
u_str LISTEN 0 128 /var/run/bcron-spool 1466637 * 0 users:(("unixserver",pid=20629,fd=3))
|
|
||||||
```
|
|
||||||
|
|
||||||
That tells us that the listener is a command `unixserver` running as process ID 20629. (It’s a coincidence that it’s also using file descriptor 3 for the socket.)
|
|
||||||
|
|
||||||
The second really useful tool for finding the same information is `lsof`. It can list all open files (or file descriptors) on the system. Alternatively, we can get information about a specific file:
|
|
||||||
|
|
||||||
```
|
|
||||||
# lsof /var/run/bcron-spool
|
|
||||||
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
|
|
||||||
unixserve 20629 cron 3u unix 0x000000005ac4bd83 0t0 1466637 /var/run/bcron-spool type=STREAM
|
|
||||||
```
|
|
||||||
|
|
||||||
Process 20629 is a long-running server, so we can attach `strace` to it using something like `strace -o /tmp/trace -p 20629`. If we then try to edit the cron schedule in another terminal, we can capture a trace while the error is happening. Here’s the result:
|
|
||||||
|
|
||||||
```
|
|
||||||
accept(3, NULL, NULL) = 4
|
|
||||||
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7faa47c44810) = 21181
|
|
||||||
close(4) = 0
|
|
||||||
accept(3, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
|
|
||||||
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21181, si_uid=998, si_status=0, si_utime=0, si_stime=0} ---
|
|
||||||
wait4(0, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 21181
|
|
||||||
wait4(0, 0x7ffe6bc36764, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
|
|
||||||
rt_sigaction(SIGCHLD, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, 8) = 0
|
|
||||||
rt_sigreturn({mask=[]}) = 43
|
|
||||||
accept(3, NULL, NULL) = 4
|
|
||||||
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7faa47c44810) = 21200
|
|
||||||
close(4) = 0
|
|
||||||
accept(3, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
|
|
||||||
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21200, si_uid=998, si_status=111, si_utime=0, si_stime=0} ---
|
|
||||||
wait4(0, [{WIFEXITED(s) && WEXITSTATUS(s) == 111}], WNOHANG|WSTOPPED, NULL) = 21200
|
|
||||||
wait4(0, 0x7ffe6bc36764, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
|
|
||||||
rt_sigaction(SIGCHLD, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, 8) = 0
|
|
||||||
rt_sigreturn({mask=[]}) = 43
|
|
||||||
accept(3, NULL, NULL
|
|
||||||
```
|
|
||||||
|
|
||||||
(The last `accept` doesn’t complete during the trace period.) Unfortunately, once again, this trace doesn’t contain the error we’re after. We don’t see any of the messages that we saw `bcrontab` sending to and receiving from the socket. Instead, we see a lot of process management (`clone`, `wait4`, `SIGCHLD`, etc.). This process is spawning a child process, which we can guess is doing the real work. If we want to catch a trace of that, we have to add `-f` to the `strace` invocation. Here’s what we find if we search for the error message after getting a new trace with `strace -f -o /tmp/trace -p 20629`:
|
|
||||||
|
|
||||||
```
|
|
||||||
21470 openat(AT_FDCWD, "tmp/spool.21470.1573692319.854640", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
|
|
||||||
21470 write(1, "32:ZCould not create temporary f"..., 36) = 36
|
|
||||||
21470 write(2, "bcron-spool[21470]: Fatal: logs:"..., 84) = 84
|
|
||||||
21470 unlink("tmp/spool.21470.1573692319.854640") = -1 ENOENT (No such file or directory)
|
|
||||||
21470 exit_group(111) = ?
|
|
||||||
21470 +++ exited with 111 +++
|
|
||||||
```
|
|
||||||
|
|
||||||
Now we’re getting somewhere. Process ID 21470 is getting a permission denied error trying to create a file at the path `tmp/spool.21470.1573692319.854640` (relative to the current working directory). If we just knew the current working directory, we would know the full path and could figure out why the process can’t create create its temporary file there. Unfortunately, the process has already exited, so we can’t just use `lsof -p 21470` to find out the current directory, but we can work backwards looking for PID 21470 system calls that change directory. (If there aren’t any, PID 21470 must have inherited it from its parent, and we can `lsof -p` that.) That system call is `chdir` (which is easy to find out using today’s web search engines). Here’s the result of working backwards through the trace, all the way to the server PID 20629:
|
|
||||||
|
|
||||||
```
|
|
||||||
20629 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7faa47c44810) = 21470
|
|
||||||
...
|
|
||||||
21470 execve("/usr/sbin/bcron-spool", ["bcron-spool"], 0x55d2460807e0 /* 27 vars */) = 0
|
|
||||||
...
|
|
||||||
21470 chdir("/var/spool/cron") = 0
|
|
||||||
...
|
|
||||||
21470 openat(AT_FDCWD, "tmp/spool.21470.1573692319.854640", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
|
|
||||||
21470 write(1, "32:ZCould not create temporary f"..., 36) = 36
|
|
||||||
21470 write(2, "bcron-spool[21470]: Fatal: logs:"..., 84) = 84
|
|
||||||
21470 unlink("tmp/spool.21470.1573692319.854640") = -1 ENOENT (No such file or directory)
|
|
||||||
21470 exit_group(111) = ?
|
|
||||||
21470 +++ exited with 111 +++
|
|
||||||
```
|
|
||||||
|
|
||||||
(If you’re getting lost here, you might want to read [my previous post about *nix process management and shells][9].) Okay, so the server PID 20629 doesn’t have permission to create a file at `/var/spool/cron/tmp/spool.21470.1573692319.854640`. The most likely reason would be classic *nix filesystem permission settings. Let’s check:
|
|
||||||
|
|
||||||
```
|
|
||||||
# ls -ld /var/spool/cron/tmp/
|
|
||||||
drwxr-xr-x 2 root root 4096 Nov 6 05:33 /var/spool/cron/tmp/
|
|
||||||
# ps u -p 20629
|
|
||||||
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
|
||||||
cron 20629 0.0 0.0 2276 752 ? Ss Nov14 0:00 unixserver -U /var/run/bcron-spool -- bcron-spool
|
|
||||||
```
|
|
||||||
|
|
||||||
There’s the problem! The server is running as user `cron`, but only `root` has permissions to write to that `/var/spool/cron/tmp/` directory. A simple `chown cron /var/spool/cron/tmp/` makes `bcron` work properly. (If that weren’t the problem, the next most likely suspect would be a kernel security module like SELinux or AppArmor, so I’d check the kernel logs with `dmesg`.)
|
|
||||||
|
|
||||||
### Summary
|
|
||||||
|
|
||||||
System call traces can be overwhelming at first, but I hope I’ve shown that they’re a fast way to debug a whole class of common deployment problems. Imagine trying to debug that multi-process `bcron` problem using a stepping debugger.
|
|
||||||
|
|
||||||
Working back through a chain of system calls takes practice, but as I said, most of the time I use `strace` I just get a trace and look for errors, working from the bottom up. In any case, `strace` has saved me hours and hours of debugging time. I hope it’s useful for you, too.
|
|
||||||
|
|
||||||
--------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
via: https://theartofmachinery.com/2019/11/14/deployment_debugging_strace.html
|
|
||||||
|
|
||||||
作者:[Simon Arneaud][a]
|
|
||||||
选题:[lujun9972][b]
|
|
||||||
译者:[译者ID](https://github.com/译者ID)
|
|
||||||
校对:[校对者ID](https://github.com/校对者ID)
|
|
||||||
|
|
||||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
|
||||||
|
|
||||||
[a]: https://theartofmachinery.com
|
|
||||||
[b]: https://github.com/lujun9972
|
|
||||||
[1]: https://strace.io/
|
|
||||||
[2]: http://dtrace.org/blogs/about/
|
|
||||||
[3]: https://man.openbsd.org/ktrace
|
|
||||||
[4]: https://theartofmachinery.com/images/strace/system_calls.svg
|
|
||||||
[5]: https://linux.die.net/man/1/ltrace
|
|
||||||
[6]: https://linux.die.net/man/1/strace
|
|
||||||
[7]: https://beej.us/guide/bgnet/html/index.html
|
|
||||||
[8]: https://untroubled.org/bcron/
|
|
||||||
[9]: https://theartofmachinery.com/2018/11/07/writing_a_nix_shell.html
|
|
@ -0,0 +1,347 @@
|
|||||||
|
[#]: collector: (lujun9972)
|
||||||
|
[#]: translator: (hanwckf)
|
||||||
|
[#]: reviewer: ( )
|
||||||
|
[#]: publisher: ( )
|
||||||
|
[#]: url: ( )
|
||||||
|
[#]: subject: (Debugging Software Deployments with strace)
|
||||||
|
[#]: via: (https://theartofmachinery.com/2019/11/14/deployment_debugging_strace.html)
|
||||||
|
[#]: author: (Simon Arneaud https://theartofmachinery.com)
|
||||||
|
|
||||||
|
在软件部署中使用 strace 进行调试
|
||||||
|
======
|
||||||
|
|
||||||
|
我的大部分工作都包括部署软件系统,这意味着我需要花费很多时间来解决以下问题:
|
||||||
|
|
||||||
|
* 这个软件可以在原始开发者的机器上工作,但是为什么不能在我这里运行?
|
||||||
|
* 这个软件昨天可以在我的机器上工作,但是为什么今天就不行?
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
这是调试的一种类型,但是与传统的软件调试有所不同。传统的调试通常只关心代码的逻辑,但是在软件部署中的调试关注的是程序的代码和它所在的运行环境之间的相互影响。即便问题的根源是代码的逻辑错误,但软件显然可以在别的机器上运行的事实意味着这类问题与运行环境密切相关。
|
||||||
|
|
||||||
|
所以,在软件部署过程中,我没有使用传统的调试工具(例如 `gdb`),而是选择了其它工具进行调试。我最喜欢的用来解决“为什么这个软件无法在这台机器上运行?”这类问题的工具就是 `strace`。
|
||||||
|
|
||||||
|
### 什么是 `strace`?
|
||||||
|
|
||||||
|
[`strace`][1] 是一个用来“追踪系统调用”的工具。它主要是一个 Linux 工具,但是你也可以在其它系统上使用类似的工具(例如 [DTrace][2] 和 [ktrace][3])。
|
||||||
|
|
||||||
|
它的基本用法非常简单。只需要在 `strace` 后面跟上你需要运行的命令,它就会显示出该命令触发的所有系统调用(你可能需要先安装好 `strace`):
|
||||||
|
|
||||||
|
```
|
||||||
|
$ strace echo Hello
|
||||||
|
...Snip lots of stuff...
|
||||||
|
write(1, "Hello\n", 6) = 6
|
||||||
|
close(1) = 0
|
||||||
|
close(2) = 0
|
||||||
|
exit_group(0) = ?
|
||||||
|
+++ exited with 0 +++
|
||||||
|
```
|
||||||
|
|
||||||
|
这些系统调用都是什么?他们就像是操作系统提供的 API。很久以前,软件拥有直接访问硬件的权限。如果软件需要在屏幕上显示一些东西,它将会与视频硬件的端口和内存映射寄存器纠缠不清。当多任务操作系统变得流行以后,这就导致了混乱的局面,因为不同的应用程序将“争夺”硬件,并且一个应用程序的错误可能致使其它应用程序崩溃,甚至导致整个系统崩溃。所以 CPUs 开始支持多种不同的特权模式 (或者称为“保护环”)。它们让操作系统内核在具有完全硬件访问权限的最高特权模式下运行,于此同时,其它在低特权模式下运行的应用程序必须通过向内核发起系统调用才能够与硬件进行交互。
|
||||||
|
|
||||||
|
在二进制级别上,发起系统调用相比简单的函数调用有一些区别,但是大部分程序都使用标准库提供的封装函数。例如,POSIX C 标准库包含一个 `write()` 函数,该函数包含用于进行 `write` 系统调用的所有与硬件体系结构相关的代码。
|
||||||
|
|
||||||
|
![][4]
|
||||||
|
|
||||||
|
简单来说,一个应用程序与其环境(计算机系统)的相互影响都是通过系统调用来作用的。所以当软件在一台机器上可以工作但是在另一台机器无法工作的时候,追踪系统调用是一个很好的查错方法。具体地说,你可以通过追踪系统调用分析以下典型操作:
|
||||||
|
|
||||||
|
* 控制台输入与输出 (IO)
|
||||||
|
* 网络 IO
|
||||||
|
* 文件系统访问以及文件 IO
|
||||||
|
* 进程/线程 生命周期管理
|
||||||
|
* 原始内存管理
|
||||||
|
* 访问特定的设备驱动
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### 什么时候可以使用 `strace`?
|
||||||
|
|
||||||
|
理论上,`strace` 适用于任何用户空间程序,因为所有的用户空间程序都需要进行系统调用。`strace` 对于已编译的低级程序最有效果,但如果你可以避免运行时环境和解释器带来的大量额外输出,则仍然可以与 Python 等高级语言程序一起使用。
|
||||||
|
|
||||||
|
当软件在一台机器上正常工作,但在另一台机器上却不能正常工作,同时抛出有关文件、权限或者不能运行某某命令等模糊的错误信息时,`strace` 往往能大显身手。不幸的是,它不能诊断高等级的问题,例如数字证书验证错误等。这些问题通常需要结合 `strace`(有时候是 [`ltrace`][5]),以及其它高级工具(例如使用 `openssl` 命令行工具调试数字证书错误)。
|
||||||
|
|
||||||
|
本文中的示例基于独立的服务器,但是对系统调用的追踪通常也可以在更复杂的部署平台上完成,仅需要找到合适的工具。
|
||||||
|
|
||||||
|
### 一个简单的例子
|
||||||
|
|
||||||
|
假设你正在尝试运行一个叫做 foo 的服务器应用程序,但是发生了以下情况:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ foo
|
||||||
|
Error opening configuration file: No such file or directory
|
||||||
|
```
|
||||||
|
|
||||||
|
显然,它没有找到你已经写好的配置文件。之所以会发生这种情况,是因为包管理工具有时候在编译应用程序时指定了自定义的路径,所以你应当遵循特定发行版提供的安装指南。如果错误信息告诉你正确的配置文件应该在什么地方,你就可以在几秒钟内解决这个问题,但事实并非如此。你该如何找到正确的路径?
|
||||||
|
|
||||||
|
如果你有权访问源代码,则可以通过阅读源代码来解决问题。这是一个好的备用计划,但不是最快的解决方案。你还可以使用类似 `gdb` 的单步调试器来观察程序的行为,但使用专门用于展示程序与系统环境交互作用的工具 `strace` 更加有效。
|
||||||
|
|
||||||
|
一开始, `strace` 产生的大量输出可能会让你不知所措,幸好你可以忽略其中大部分的无用信息。我经常使用 `-o` 参数把输出的追踪结果保存到单独的文件里:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ strace -o /tmp/trace foo
|
||||||
|
Error opening configuration file: No such file or directory
|
||||||
|
$ cat /tmp/trace
|
||||||
|
execve("foo", ["foo"], 0x7ffce98dc010 /* 16 vars */) = 0
|
||||||
|
brk(NULL) = 0x56363b3fb000
|
||||||
|
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
|
||||||
|
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
|
||||||
|
fstat(3, {st_mode=S_IFREG|0644, st_size=25186, ...}) = 0
|
||||||
|
mmap(NULL, 25186, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f2f12cf1000
|
||||||
|
close(3) = 0
|
||||||
|
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
|
||||||
|
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260A\2\0\0\0\0\0"..., 832) = 832
|
||||||
|
fstat(3, {st_mode=S_IFREG|0755, st_size=1824496, ...}) = 0
|
||||||
|
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2f12cef000
|
||||||
|
mmap(NULL, 1837056, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2f12b2e000
|
||||||
|
mprotect(0x7f2f12b50000, 1658880, PROT_NONE) = 0
|
||||||
|
mmap(0x7f2f12b50000, 1343488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7f2f12b50000
|
||||||
|
mmap(0x7f2f12c98000, 311296, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16a000) = 0x7f2f12c98000
|
||||||
|
mmap(0x7f2f12ce5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b6000) = 0x7f2f12ce5000
|
||||||
|
mmap(0x7f2f12ceb000, 14336, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2f12ceb000
|
||||||
|
close(3) = 0
|
||||||
|
arch_prctl(ARCH_SET_FS, 0x7f2f12cf0500) = 0
|
||||||
|
mprotect(0x7f2f12ce5000, 16384, PROT_READ) = 0
|
||||||
|
mprotect(0x56363b08b000, 4096, PROT_READ) = 0
|
||||||
|
mprotect(0x7f2f12d1f000, 4096, PROT_READ) = 0
|
||||||
|
munmap(0x7f2f12cf1000, 25186) = 0
|
||||||
|
openat(AT_FDCWD, "/etc/foo/config.json", O_RDONLY) = -1 ENOENT (No such file or directory)
|
||||||
|
dup(2) = 3
|
||||||
|
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
|
||||||
|
brk(NULL) = 0x56363b3fb000
|
||||||
|
brk(0x56363b41c000) = 0x56363b41c000
|
||||||
|
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x8), ...}) = 0
|
||||||
|
write(3, "Error opening configuration file"..., 60) = 60
|
||||||
|
close(3) = 0
|
||||||
|
exit_group(1) = ?
|
||||||
|
+++ exited with 1 +++
|
||||||
|
```
|
||||||
|
|
||||||
|
`strace` 输出的第一页通常是低级的进程启动过程。(你可以看到很多 `mmap`,`mprotect`,`brk` 调用,这是用来分配原始内存和映射动态链接库的。)实际上,在查找错误时,最好从下往上阅读 `strace` 的输出。你可以看到 `write` 调用在最后返回了错误信息。如果你努力了,你将会看到第一个失败的系统调用是 `openat`,它在尝试打开 `/etc/foo/config.json` 时抛出了 `ENOENT` (“No such file or directory”)的错误。现在我们已经知道了配置文件应该放在哪里。
|
||||||
|
|
||||||
|
这是一个简单的例子,但我敢说在 90% 的情况下,使用 `strace` 进行调试不需要更多复杂的工作。以下是完整的调试步骤:
|
||||||
|
|
||||||
|
1. 从程序中获得含糊不清的错误信息
|
||||||
|
2. 使用 `strace` 运行程序
|
||||||
|
3. 在输出中找到错误信息
|
||||||
|
4. 往前追溯并找到第一个失败的系统调用
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
第四步中的系统调用很可能向你显示出问题所在。
|
||||||
|
|
||||||
|
### 小技巧
|
||||||
|
|
||||||
|
在开始更加复杂的调试之前,这里有一些有用的调试技巧帮助你高效使用 `strace`:
|
||||||
|
|
||||||
|
#### `man` 是你的朋友
|
||||||
|
|
||||||
|
在很多 *nix 操作系统中,你可以通过 `man syscalls` 查看系统调用的列表。你将会看到类似于 `brk(2)` 之类的东西,这意味着你可以通过运行 `man 2 brk` 得到与此相关的更多信息。
|
||||||
|
|
||||||
|
一个小问题:`man 2 fork` 会显示出在 GNU `libc` 里封装的 `fork()` 手册页,而 `fork()` 现在实际上是由 `clone` 系统调用实现的。`fork` 的语义与 `clone` 相同,但是如果我写了一个含有 `fork()` 的程序并使用 `strace` 去调试它,我将找不到任何关于 `fork` 调用的信息,只能看到 `clone` 调用。只有在将源代码与 `strace` 的输出进行比较的时候,这种问题才会让人感到困惑。
|
||||||
|
|
||||||
|
#### 使用 `-o` 将输出保存到文件
|
||||||
|
|
||||||
|
`strace` 可以生成很多输出,所以将输出保存到单独的文件是很有帮助的 (就像上面的例子一样)。它还能够在控制台中避免程序自身的输出与 `strace` 的输出发生混淆。
|
||||||
|
|
||||||
|
#### 使用 `-s` 查看更多的参数
|
||||||
|
|
||||||
|
你可能已经注意到,错误信息的第二部分没有出现在上面的例子中。这是因为 `strace` 默认仅显示字符串参数的前 32 个字节。如果你需要捕获更多参数,请向 `strace` 追加类似于 `-s 128` 之类的参数。
|
||||||
|
|
||||||
|
#### `-y` 使得追踪文件或套接字更加容易
|
||||||
|
|
||||||
|
“一切皆文件”意味着 *nix 系统通过文件描述符进行所有 IO 操作,不管是真实的文件还是通过网络或者进程间管道。这对于编程而言是很方便的,但是在追踪系统调用时,你将很难分辨出 `read` 和 `write` 的真实行为。
|
||||||
|
|
||||||
|
`-y` 参数使 `strace` 在注释中注明每个文件描述符的具体指向。
|
||||||
|
|
||||||
|
#### 使用 `-p` 附加到正在运行的进程中
|
||||||
|
|
||||||
|
正如我们将在后面的例子中看到的,有时候你想追踪一个正在运行的程序。如果你知道这个程序的进程号为 1337 (可以通过 `ps` 查询),则可以这样操作:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ strace -p 1337
|
||||||
|
...system call trace output...
|
||||||
|
```
|
||||||
|
|
||||||
|
你可能需要 root 权限才能运行。
|
||||||
|
|
||||||
|
#### 使用 `-f` 追踪子进程
|
||||||
|
|
||||||
|
`strace` 默认只追踪一个进程。如果这个进程产生了一个子进程,你将会看到创建子进程的系统调用(一般是 `clone`),但是你看不到子进程内触发的任何调用。
|
||||||
|
|
||||||
|
如果你认为在子进程中存在 bug,则需要使用 `-f` 参数启用子进程追踪功能。这样做的缺点是输出的内容会让人更加困惑。当追踪一个进程时,`strace` 显示的是单个调用事件流。当追踪多个进程的时候,你将会看到以 `<unfinished ...>` 开始的初始调用,接着是一系列针对其它线程的调用,最后才出现以 `<... foocall resumed>` 结束的初始调用。此外,你可以使用 `-ff` 参数将所有的调用分离到不同的文件中(查看 [the `strace` manual][6] 获取更多信息)。
|
||||||
|
|
||||||
|
#### 使用 `-e` 进行过滤
|
||||||
|
|
||||||
|
正如你所看到的,默认的追踪输出是所有的系统调用。你可以使用 `-e` 参数过滤你需要追踪的调用(查看 [the `strace` manual][6])。这样做的好处是运行过滤后的 `strace` 比起使用 `grep` 进行二次过滤要更快。老实说,我大部分时间都不会被打扰。
|
||||||
|
|
||||||
|
#### 并非所有的错误都是不好的
|
||||||
|
|
||||||
|
一个简单而常用的例子是一个程序在多个位置搜索文件,例如 shell 搜索哪个 `bin/` 目录包含可执行文件:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ strace sh -c uname
|
||||||
|
...
|
||||||
|
stat("/home/user/bin/uname", 0x7ffceb817820) = -1 ENOENT (No such file or directory)
|
||||||
|
stat("/usr/local/bin/uname", 0x7ffceb817820) = -1 ENOENT (No such file or directory)
|
||||||
|
stat("/usr/bin/uname", {st_mode=S_IFREG|0755, st_size=39584, ...}) = 0
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
“错误信息之前的最后一次失败调用”这种启发式方法非常适合于查找错误。无论如何,自下而上地工作是有道理的。
|
||||||
|
|
||||||
|
#### C编程指南非常有助于理解系统调用
|
||||||
|
|
||||||
|
标准 C 库函数调用不属于系统调用,但它们仅是系统调用之上的唯一一个薄层。所以如果你了解(甚至只是略知一二)如何使用 C 语言,那么阅读系统调用追踪信息就非常容易。例如,如果你在调试网络系统调用,你可以尝试略读 [Beej’s classic Guide to Network Programming][7]。
|
||||||
|
|
||||||
|
### 一个更复杂的调试例子
|
||||||
|
|
||||||
|
就像我说的那样,简单的调试例子代表我在大部分情况下如何使用 `strace` 。然而,有时候需要一些更加细致的工作,所以这里有一个稍微复杂(且真实)的例子。
|
||||||
|
|
||||||
|
[`bcron`][8] 是一个任务调度器,它是经典 *nix `cron` 守护程序的一种实现。它已经被安装到一台服务器上,但是当有人尝试编辑作业时间表时,发生了以下情况:
|
||||||
|
|
||||||
|
```
|
||||||
|
# crontab -e -u logs
|
||||||
|
bcrontab: Fatal: Could not create temporary file
|
||||||
|
```
|
||||||
|
|
||||||
|
好的,现在 bcron 尝试写入一些文件,但是它失败了,也没有告诉我们原因。以下是 `strace` 的输出:
|
||||||
|
|
||||||
|
```
|
||||||
|
# strace -o /tmp/trace crontab -e -u logs
|
||||||
|
bcrontab: Fatal: Could not create temporary file
|
||||||
|
# cat /tmp/trace
|
||||||
|
...
|
||||||
|
openat(AT_FDCWD, "bcrontab.14779.1573691864.847933", O_RDONLY) = 3
|
||||||
|
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f82049b4000
|
||||||
|
read(3, "#Ansible: logsagg\n20 14 * * * lo"..., 8192) = 150
|
||||||
|
read(3, "", 8192) = 0
|
||||||
|
munmap(0x7f82049b4000, 8192) = 0
|
||||||
|
close(3) = 0
|
||||||
|
socket(AF_UNIX, SOCK_STREAM, 0) = 3
|
||||||
|
connect(3, {sa_family=AF_UNIX, sun_path="/var/run/bcron-spool"}, 110) = 0
|
||||||
|
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f82049b4000
|
||||||
|
write(3, "156:Slogs\0#Ansible: logsagg\n20 1"..., 161) = 161
|
||||||
|
read(3, "32:ZCould not create temporary f"..., 8192) = 36
|
||||||
|
munmap(0x7f82049b4000, 8192) = 0
|
||||||
|
close(3) = 0
|
||||||
|
write(2, "bcrontab: Fatal: Could not creat"..., 49) = 49
|
||||||
|
unlink("bcrontab.14779.1573691864.847933") = 0
|
||||||
|
exit_group(111) = ?
|
||||||
|
+++ exited with 111 +++
|
||||||
|
```
|
||||||
|
|
||||||
|
在程序结束之前有一个 `write` 的错误信息,但是这次有些不同。首先,在此之前没有任何相关的失败系统调用。其次,我们看到这个错误信息是由 `read` 从别的地方读取而来的。这看起来像是真正的错误发生在别的地方,而 `bcrontab` 只是在转播这些信息。
|
||||||
|
|
||||||
|
如果你查阅了 `man 2 read`,你将会看到 `read` 的第三个参数 (3) 代表文件描述符,这是 *nix 操作系统用于所有 IO 操作的句柄。你该如何知道文件描述符 3 代表什么?在这种情况下,你可以使用 `-y` 参数运行 `strace`(如上文所述),它将会在注释里告诉你文件描述符的具体指向,但是了解如何从上面这种输出中分析追踪结果是很有用的。
|
||||||
|
|
||||||
|
一个文件描述符可以来自于许多系统调用之一(这取决于它是用于控制台、网络套接字还是真实文件等的描述符),但不论如何,我们都可以搜索返回值为 3 的系统调用(例如,在 `strace` 的输出中查找 “=3”)。在这次 `strace` 中可以看到有两个这样的调用:最上面的 `openat` 以及中间的 `socket`。`openat` 打开一个文件,但是紧接着的 `close(3)` 表明其已经被关闭。(注意:文件描述符可以在打开并关闭后重复使用。)所以 `socket` 调用才是与此相关的(它是在 `read` 之前的最后一次),这告诉我们 `brcontab` 正在与一个网络套接字通信。在下一行,`connect` 表明文件描述符 3 是一个连接到 `/var/run/bcron-spool` 的 Unix 域套接字。
|
||||||
|
|
||||||
|
因此,我们需要弄清楚 Unix 套接字的另一侧是哪个进程在监听。有两个巧妙的技巧适用于在服务器部署中调试。一个是使用 `netstat` 或者较新的 `ss`。这两个命令都描述了当前系统中活跃的网络套接字,使用 `-l` 参数可以显示出处于监听状态的套接字,而使用 `-p` 参数可以得到正在使用该套接字的程序信息。(它们还有更多有用的选项,但是这两个已经足够完成工作了。)
|
||||||
|
|
||||||
|
```
|
||||||
|
# ss -pl | grep /var/run/bcron-spool
|
||||||
|
u_str LISTEN 0 128 /var/run/bcron-spool 1466637 * 0 users:(("unixserver",pid=20629,fd=3))
|
||||||
|
```
|
||||||
|
|
||||||
|
这告诉我们 `/var/run/bcron-spool` 套接字的监听程序是 `unixserver` 这个命令,它的进程 ID 为 20629。(巧合的是,这个程序也使用文件描述符 3 去连接这个套接字。)
|
||||||
|
|
||||||
|
第二个常用的工具就是使用 `lsof` 查找相同的信息。它可以列出当前系统中打开的所有文件(或文件描述符)。或者,我们可以得到一个具体文件的信息:
|
||||||
|
|
||||||
|
```
|
||||||
|
# lsof /var/run/bcron-spool
|
||||||
|
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
|
||||||
|
unixserve 20629 cron 3u unix 0x000000005ac4bd83 0t0 1466637 /var/run/bcron-spool type=STREAM
|
||||||
|
```
|
||||||
|
|
||||||
|
进程 20629 是一个常驻进程,所以我们可以使用 `strace -o /tmp/trace -p 20629` 去查看该进程的系统调用。如果我们在另一个终端尝试编辑 cron 的计划任务表,就可以在错误发生时捕获到以下信息:
|
||||||
|
|
||||||
|
```
|
||||||
|
accept(3, NULL, NULL) = 4
|
||||||
|
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7faa47c44810) = 21181
|
||||||
|
close(4) = 0
|
||||||
|
accept(3, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
|
||||||
|
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21181, si_uid=998, si_status=0, si_utime=0, si_stime=0} ---
|
||||||
|
wait4(0, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|WSTOPPED, NULL) = 21181
|
||||||
|
wait4(0, 0x7ffe6bc36764, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
|
||||||
|
rt_sigaction(SIGCHLD, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, 8) = 0
|
||||||
|
rt_sigreturn({mask=[]}) = 43
|
||||||
|
accept(3, NULL, NULL) = 4
|
||||||
|
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7faa47c44810) = 21200
|
||||||
|
close(4) = 0
|
||||||
|
accept(3, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
|
||||||
|
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=21200, si_uid=998, si_status=111, si_utime=0, si_stime=0} ---
|
||||||
|
wait4(0, [{WIFEXITED(s) && WEXITSTATUS(s) == 111}], WNOHANG|WSTOPPED, NULL) = 21200
|
||||||
|
wait4(0, 0x7ffe6bc36764, WNOHANG|WSTOPPED, NULL) = -1 ECHILD (No child processes)
|
||||||
|
rt_sigaction(SIGCHLD, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, {sa_handler=0x55d244bdb690, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7faa47ab9840}, 8) = 0
|
||||||
|
rt_sigreturn({mask=[]}) = 43
|
||||||
|
accept(3, NULL, NULL
|
||||||
|
```
|
||||||
|
|
||||||
|
(最后一个 `accept` 调用没有在追踪周期里完成。)不幸的是,这次追踪没有包含我们想要的错误信息。我们没有观察到 `bcrontan` 往套接字发送或接受的任何信息。然而,我们看到了很多进程管理操作(`clone`,`wait4`,`SIGCHLD`,等等)。这个进程产生了子进程,我们猜测真实的工作是由子进程完成的。如果我们想捕获子进程的追踪信息,就必须往 `strace` 追加 `-f` 参数。以下是我们最终使用 `strace -f -o /tmp/trace -p 20629` 找到的错误信息:
|
||||||
|
|
||||||
|
```
|
||||||
|
21470 openat(AT_FDCWD, "tmp/spool.21470.1573692319.854640", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
|
||||||
|
21470 write(1, "32:ZCould not create temporary f"..., 36) = 36
|
||||||
|
21470 write(2, "bcron-spool[21470]: Fatal: logs:"..., 84) = 84
|
||||||
|
21470 unlink("tmp/spool.21470.1573692319.854640") = -1 ENOENT (No such file or directory)
|
||||||
|
21470 exit_group(111) = ?
|
||||||
|
21470 +++ exited with 111 +++
|
||||||
|
```
|
||||||
|
|
||||||
|
现在我们知道了进程 ID 21470 在尝试创建文件 `tmp/spool.21470.1573692319.854640` (相对于当前的工作目录)时得到了一个没有权限的错误。如果我们知道当前的工作目录,就可以得到完整路径并能指出为什么该进程无法在此处创建临时文件。不幸的是,这个进程已经退出了,所以我们不能使用 `lsof -p 21470` 去找出当前的工作目录,但是我们可以往前追溯,查找进程 ID 21470 使用哪个系统调用改变了它的工作目录。这个系统调用是 `chdir`(可以在搜索引擎很轻松地找到)。以下是一直往前追溯到服务器进程 ID 20629 的结果:
|
||||||
|
|
||||||
|
```
|
||||||
|
20629 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7faa47c44810) = 21470
|
||||||
|
...
|
||||||
|
21470 execve("/usr/sbin/bcron-spool", ["bcron-spool"], 0x55d2460807e0 /* 27 vars */) = 0
|
||||||
|
...
|
||||||
|
21470 chdir("/var/spool/cron") = 0
|
||||||
|
...
|
||||||
|
21470 openat(AT_FDCWD, "tmp/spool.21470.1573692319.854640", O_RDWR|O_CREAT|O_EXCL, 0600) = -1 EACCES (Permission denied)
|
||||||
|
21470 write(1, "32:ZCould not create temporary f"..., 36) = 36
|
||||||
|
21470 write(2, "bcron-spool[21470]: Fatal: logs:"..., 84) = 84
|
||||||
|
21470 unlink("tmp/spool.21470.1573692319.854640") = -1 ENOENT (No such file or directory)
|
||||||
|
21470 exit_group(111) = ?
|
||||||
|
21470 +++ exited with 111 +++
|
||||||
|
```
|
||||||
|
|
||||||
|
(如果你在这里失败了,你可能需要阅读 [我之前有关 *nix 进程管理和 shell 的文章][9])好的,现在 PID 为 20629 的服务器进程没有权限在 `/var/spool/cron/tmp/spool.21470.1573692319.854640` 创建文件。最可能的原因就是典型的 *nix 文件系统权限设置。让我们检查一下:
|
||||||
|
|
||||||
|
```
|
||||||
|
# ls -ld /var/spool/cron/tmp/
|
||||||
|
drwxr-xr-x 2 root root 4096 Nov 6 05:33 /var/spool/cron/tmp/
|
||||||
|
# ps u -p 20629
|
||||||
|
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
||||||
|
cron 20629 0.0 0.0 2276 752 ? Ss Nov14 0:00 unixserver -U /var/run/bcron-spool -- bcron-spool
|
||||||
|
```
|
||||||
|
|
||||||
|
这就是问题所在!这个服务进程以 `cron` 用户运行,但是只有 `root` 用户才有向 `/var/spool/cron/tmp/` 目录写入的权限。一个简单 `chown cron /var/spool/cron/tmp/` 命令就能让 `bcron` 正常工作。(如果不是这个问题,那么下一个最有可能的怀疑对象是诸如 SELinux 或者 AppArmor 之类的内核安全模块,因此我将会使用 `dmesg` 检查内核日志。)
|
||||||
|
|
||||||
|
### 总结
|
||||||
|
|
||||||
|
最初,系统调用追踪可能会让人不知所措,但是我希望我已经证明它们是调试一整套常见部署问题的快速方法。你可以设想一下尝试用单步调试器去调试多进程的 `bcron` 问题。
|
||||||
|
|
||||||
|
通过一连串的系统调用解决问题是需要练习的,但正如我说的那样,在大多数情况下,我只需要使用 `strace` 从下往上追踪并查找错误。不管怎样,`strace` 节省了我很多的调试时间。我希望这也对你有所帮助。
|
||||||
|
|
||||||
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
via: https://theartofmachinery.com/2019/11/14/deployment_debugging_strace.html
|
||||||
|
|
||||||
|
作者:[Simon Arneaud][a]
|
||||||
|
选题:[lujun9972][b]
|
||||||
|
译者:[hanwckf](https://github.com/hanwckf)
|
||||||
|
校对:[校对者ID](https://github.com/校对者ID)
|
||||||
|
|
||||||
|
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||||
|
|
||||||
|
[a]: https://theartofmachinery.com
|
||||||
|
[b]: https://github.com/lujun9972
|
||||||
|
[1]: https://strace.io/
|
||||||
|
[2]: http://dtrace.org/blogs/about/
|
||||||
|
[3]: https://man.openbsd.org/ktrace
|
||||||
|
[4]: https://theartofmachinery.com/images/strace/system_calls.svg
|
||||||
|
[5]: https://linux.die.net/man/1/ltrace
|
||||||
|
[6]: https://linux.die.net/man/1/strace
|
||||||
|
[7]: https://beej.us/guide/bgnet/html/index.html
|
||||||
|
[8]: https://untroubled.org/bcron/
|
||||||
|
[9]: https://theartofmachinery.com/2018/11/07/writing_a_nix_shell.html
|
Loading…
Reference in New Issue
Block a user