TranslateProject/sources/tech/20211117 Debugging a weird -file not found- error.md
DarkSun 7e9de5ef63 选题[tech]: 20211117 Debugging a weird 'file not found' error
sources/tech/20211117 Debugging a weird -file not found- error.md
2021-11-20 05:03:05 +08:00

230 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[#]: subject: "Debugging a weird 'file not found' error"
[#]: via: "https://jvns.ca/blog/2021/11/17/debugging-a-weird--file-not-found--error/"
[#]: author: "Julia Evans https://jvns.ca/"
[#]: collector: "lujun9972"
[#]: translator: " "
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
Debugging a weird 'file not found' error
======
Yesterday I ran into a weird error where I ran a program and got the error “file not found” even though the program I was running existed. Its something Ive run into before, but every time Im very surprised and confused by it (what do you MEAN file not found, the file is RIGHT THERE???!!??)
So lets talk about what happened and why!
### the error
Lets start by showing the error message I got. I had a Go program called [serve.go][1], and I was trying to bundle it into a Docker container with this Dockerfile:
```
FROM golang:1.17 AS go
ADD ./serve.go /app/serve.go
WORKDIR /app
RUN go build serve.go
FROM alpine:3.14
COPY --from=go /app/serve /app/serve
COPY ./static /app/static
WORKDIR /app/static
CMD ["/app/serve"]
```
This Dockerfile
1. Builds the Go program
2. Copies the binary into an Alpine container
Pretty simple. Seems like it should work, right?
But when I try to run `/app/serve`, this happens:
```
$ docker build .
$ docker run -it broken-container:latest /app/serve
standard_init_linux.go:228: exec user process caused: no such file or directory
```
But the file definitely does exist:
```
$ docker run -it broken-container:latest ls -l /app/serve
-rwxr-xr-x 1 root root 6220237 Nov 16 13:27 /app/serve
```
So whats going on?
### idea 1: permissions
At first I thought “hmm, maybe the permissions are wrong?”. But this cant be the problem, because:
* permission problems dont result in a “no such file or directory” error
* in any case when we ran `ls -l`, we saw that the file was executable
(Im including this even though its “obviously” wrong just because I have a lot of wrong thoughts when debugging, its part of the process :) )
### idea 2: strace
Then I decided to use strace, as always. Lets see what stracing `/app/serve/` looks like
```
$ docker run -it broken-container:latest /bin/sh
$ /app/static # apk add strace
(apk output omitted)
$ /app/static # strace /app/serve
execve("/app/serve", ["/app/serve"], 0x7ffdd08edd50 /* 6 vars */) = -1 ENOENT (No such file or directory)
strace: exec: No such file or directory
+++ exited with 1 +++
```
This is not that helpful, it just says “No such file or directory” again. But at least we know that the error is being thrown right away when we run the `evecve` system call, so thats good.
Interestingly though, this is different from what happens when we try to strace a nonexistent binary:
```
$ strace /app/asdf
strace: Can't stat '/app/asdf': No such file or directory
```
### idea 3: google “enoent but file exists execve”
I vaguely remembered that there was some reason you could get an `ENOENT` error when executing a program even if the file did exist, so I googled it. This led me to [this stack overflow answer][2]
which said, very helpfully:
> When execve() returns the error ENOENT, it can mean more than one thing:
> 1\. the program doesnt exist;
> 2\. the program itself exists, but it requires an “interpreter” that doesnt exist.
>
> ELF executables can request to be loaded by another program, in a way very similar to `#!/bin/something` in shell scripts.
That answer says that we can find the interpreter with `readelf -l $PROGRAM | grep interpreter`. So lets do that!
### step 4: use `readelf`
I didnt have `readelf` installed in the container and I wasnt sure how to install it, so I ran `mount` to get the path to the containers filesystem and then ran `readelf` from the host using that overlay directory.
(as an aside: this is kind of a weird way to do this, but as a result of writing a [containers zine][3] Im used to doing weird things with containers and I think doing weird things is fun, so this way just seemed fastest to me at the time. That trick wont work if youre on a Mac though, it only works on Linux)
```
$ mount | grep docker
overlay on /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/326ILTM2UXMVY64V7JFPCSDSKG:/var/lib/docker/overlay2/l/MGGPR357UOZZWXH3SH2AYHJL3E:/var/lib/docker/overlay2/l/EEEKSBSQ6VHGJ77YF224TBVMNV:/var/lib/docker/overlay2/l/RVKU36SQ3PXEQAGBRKSQRZFDGY,upperdir=/var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/diff,workdir=/var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/work,index=off)
$ # (then I copy and paste the "merged" directory from the output)
$ readelf -l /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve | grep interp
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
01 .interp
03 .text .plt .interp .note.go.buildid
```
Okay, so the interpreter is `/lib64/ld-linux-x86-64.so.2`.
And sure enough, that file doesnt exist inside our Alpine container
```
$ docker run -it broken-container:latest ls /lib64/ld-linux-x86-64.so.2
```
### step 5: victory!
Then I googled a little more and found out that theres a `golang:alpine` container thats meant for doing Go builds targeted to be run in Alpine.
I switched to doing my build in the `golang:alpine` container and that fixed everything.
### question: why is my Go binary dynamically linked?
The problem was with the programs interpreter. But I remembered that only dynamically linked programs have interpreters, which is a bit weird I expected my Go binary to be statically linked! Whats going on with that?
First, I double checked that the Go binary was actually dynamically linked using `file` and `ldd`: (`ldd` lists the dependencies of a dynamically linked executable! its very useful!)
(Im using the docker overlay filesystem to get at the binary inside the container again)
```
$ file /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve
/var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, Go
BuildID=vd_DJvcyItRi4Q2RD0WL/z8P4ulttr6F6njfqx8CI/_odQWaUTR2e38bdHlD0-/ikjsOjlMbEOhj2qXv5AE,
not stripped
$ ldd /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve
linux-vdso.so.1 (0x00007ffe095a6000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f565a265000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f565a099000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f565a2b4000)
```
Now that I know its dynamically linked, its not that surprising that it didnt work on a different system than it was compiled on.
Some Googling tells me that I can get Go to produce a statically linked binary by setting `CGO_ENABLED=0`. Lets see if that works.
```
$ # first let's build it without that flag
$ go build serve.go
$ file ./serve
./serve: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, Go BuildID=UGBmnMfFsuwMky4-k2Mt/RaNGsMI79eYC4-dcIiP4/J7v5rNGo3sNiJqdgNR12/eR_7mqqrsil_Lr6vt-rP, not stripped
$ ldd ./serve
linux-vdso.so.1 (0x00007fff679a6000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f659cb61000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f659c995000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f659cbb0000)
$ # and now with the CGO_ENABLED_0 flag
$ env CGO_ENABLED=0 go build serve.go
$ file ./serve
./serve: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Kq392IB01ShfNVP5TugF/2q5hN74m5eLgfuzTZzR-/EatgRjlx5YYbpcroiE9q/0Fg3zUxJKY3lbsZ9Ufda, not stripped
$ ldd ./serve
not a dynamic executable
```
It works! I checked, and thats an alternative way to fix this bug if I just set the `CGO_ENABLED=0` environment variable in my build container, then I can build a static binary and I dont need to switch to the `golang:alpine` container for my builds. I kind of like that fix better.
And statically linking in this case doesnt even produce a bigger binary (for some reason it seems to produce a slightly _smaller_ binary?? I dont know why that is)
I still dont understand _why_ its using cgo here, I ran `env | grep CGO` and I definitely dont have `CGO_ENABLED=1` set in my environment, but I dont feel like solving that mystery right now.
### that was a fun bug!
I thought this bug was a nice way to see how you can run into problems when compiling a dynamically linked executable on one platform and running it on another one! And to learn about the fact that ELF files have an interpreter!
Ive run into this “file not found” error a couple of times, and it feels kind of mind bending because it initially seems impossible (BUT THE FILE IS THERE!!! I SEE IT!!!). I hope this helps someone be less confused if you run into it!
--------------------------------------------------------------------------------
via: https://jvns.ca/blog/2021/11/17/debugging-a-weird--file-not-found--error/
作者:[Julia Evans][a]
选题:[lujun9972][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://jvns.ca/
[b]: https://github.com/lujun9972
[1]: https://gist.github.com/jvns/6147bc21fbb60b0090d543bb5e240134
[2]: https://superuser.com/a/507031
[3]: https://wizardzines.com/zines/containers