sources/tech/20211117 Debugging a weird -file not found- error.md
10 KiB
Debugging a weird 'file not found' error
Yesterday I ran into a weird error where I ran a program and got the error “file not found” even though the program I was running existed. It’s something I’ve run into before, but every time I’m very surprised and confused by it (what do you MEAN file not found, the file is RIGHT THERE???!!??)
So let’s talk about what happened and why!
the error
Let’s start by showing the error message I got. I had a Go program called serve.go, and I was trying to bundle it into a Docker container with this Dockerfile:
FROM golang:1.17 AS go
ADD ./serve.go /app/serve.go
WORKDIR /app
RUN go build serve.go
FROM alpine:3.14
COPY --from=go /app/serve /app/serve
COPY ./static /app/static
WORKDIR /app/static
CMD ["/app/serve"]
This Dockerfile
- Builds the Go program
- Copies the binary into an Alpine container
Pretty simple. Seems like it should work, right?
But when I try to run /app/serve
, this happens:
$ docker build .
$ docker run -it broken-container:latest /app/serve
standard_init_linux.go:228: exec user process caused: no such file or directory
But the file definitely does exist:
$ docker run -it broken-container:latest ls -l /app/serve
-rwxr-xr-x 1 root root 6220237 Nov 16 13:27 /app/serve
So what’s going on?
idea 1: permissions
At first I thought “hmm, maybe the permissions are wrong?”. But this can’t be the problem, because:
- permission problems don’t result in a “no such file or directory” error
- in any case when we ran
ls -l
, we saw that the file was executable
(I’m including this even though it’s “obviously” wrong just because I have a lot of wrong thoughts when debugging, it’s part of the process :) )
idea 2: strace
Then I decided to use strace, as always. Let’s see what stracing /app/serve/
looks like
$ docker run -it broken-container:latest /bin/sh
$ /app/static # apk add strace
(apk output omitted)
$ /app/static # strace /app/serve
execve("/app/serve", ["/app/serve"], 0x7ffdd08edd50 /* 6 vars */) = -1 ENOENT (No such file or directory)
strace: exec: No such file or directory
+++ exited with 1 +++
This is not that helpful, it just says “No such file or directory” again. But at least we know that the error is being thrown right away when we run the evecve
system call, so that’s good.
Interestingly though, this is different from what happens when we try to strace a nonexistent binary:
$ strace /app/asdf
strace: Can't stat '/app/asdf': No such file or directory
idea 3: google “enoent but file exists execve”
I vaguely remembered that there was some reason you could get an ENOENT
error when executing a program even if the file did exist, so I googled it. This led me to this stack overflow answer
which said, very helpfully:
When execve() returns the error ENOENT, it can mean more than one thing: 1. the program doesn’t exist; 2. the program itself exists, but it requires an “interpreter” that doesn’t exist.
ELF executables can request to be loaded by another program, in a way very similar to
#!/bin/something
in shell scripts.
That answer says that we can find the interpreter with readelf -l $PROGRAM | grep interpreter
. So let’s do that!
step 4: use readelf
I didn’t have readelf
installed in the container and I wasn’t sure how to install it, so I ran mount
to get the path to the container’s filesystem and then ran readelf
from the host using that overlay directory.
(as an aside: this is kind of a weird way to do this, but as a result of writing a containers zine I’m used to doing weird things with containers and I think doing weird things is fun, so this way just seemed fastest to me at the time. That trick won’t work if you’re on a Mac though, it only works on Linux)
$ mount | grep docker
overlay on /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/326ILTM2UXMVY64V7JFPCSDSKG:/var/lib/docker/overlay2/l/MGGPR357UOZZWXH3SH2AYHJL3E:/var/lib/docker/overlay2/l/EEEKSBSQ6VHGJ77YF224TBVMNV:/var/lib/docker/overlay2/l/RVKU36SQ3PXEQAGBRKSQRZFDGY,upperdir=/var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/diff,workdir=/var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/work,index=off)
$ # (then I copy and paste the "merged" directory from the output)
$ readelf -l /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve | grep interp
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
01 .interp
03 .text .plt .interp .note.go.buildid
Okay, so the interpreter is /lib64/ld-linux-x86-64.so.2
.
And sure enough, that file doesn’t exist inside our Alpine container
$ docker run -it broken-container:latest ls /lib64/ld-linux-x86-64.so.2
step 5: victory!
Then I googled a little more and found out that there’s a golang:alpine
container that’s meant for doing Go builds targeted to be run in Alpine.
I switched to doing my build in the golang:alpine
container and that fixed everything.
question: why is my Go binary dynamically linked?
The problem was with the program’s interpreter. But I remembered that only dynamically linked programs have interpreters, which is a bit weird – I expected my Go binary to be statically linked! What’s going on with that?
First, I double checked that the Go binary was actually dynamically linked using file
and ldd
: (ldd
lists the dependencies of a dynamically linked executable! it’s very useful!)
(I’m using the docker overlay filesystem to get at the binary inside the container again)
$ file /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve
/var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve:
ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2, Go
BuildID=vd_DJvcyItRi4Q2RD0WL/z8P4ulttr6F6njfqx8CI/_odQWaUTR2e38bdHlD0-/ikjsOjlMbEOhj2qXv5AE,
not stripped
$ ldd /var/lib/docker/overlay2/1ed587b302af7d3182135d02257f261fd491b7acf4648736d4c72f8382ecba0d/merged/app/serve
linux-vdso.so.1 (0x00007ffe095a6000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f565a265000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f565a099000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f565a2b4000)
Now that I know it’s dynamically linked, it’s not that surprising that it didn’t work on a different system than it was compiled on.
Some Googling tells me that I can get Go to produce a statically linked binary by setting CGO_ENABLED=0
. Let’s see if that works.
$ # first let's build it without that flag
$ go build serve.go
$ file ./serve
./serve: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, Go BuildID=UGBmnMfFsuwMky4-k2Mt/RaNGsMI79eYC4-dcIiP4/J7v5rNGo3sNiJqdgNR12/eR_7mqqrsil_Lr6vt-rP, not stripped
$ ldd ./serve
linux-vdso.so.1 (0x00007fff679a6000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f659cb61000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f659c995000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f659cbb0000)
$ # and now with the CGO_ENABLED_0 flag
$ env CGO_ENABLED=0 go build serve.go
$ file ./serve
./serve: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Kq392IB01ShfNVP5TugF/2q5hN74m5eLgfuzTZzR-/EatgRjlx5YYbpcroiE9q/0Fg3zUxJKY3lbsZ9Ufda, not stripped
$ ldd ./serve
not a dynamic executable
It works! I checked, and that’s an alternative way to fix this bug – if I just set the CGO_ENABLED=0
environment variable in my build container, then I can build a static binary and I don’t need to switch to the golang:alpine
container for my builds. I kind of like that fix better.
And statically linking in this case doesn’t even produce a bigger binary (for some reason it seems to produce a slightly smaller binary?? I don’t know why that is)
I still don’t understand why it’s using cgo here, I ran env | grep CGO
and I definitely don’t have CGO_ENABLED=1
set in my environment, but I don’t feel like solving that mystery right now.
that was a fun bug!
I thought this bug was a nice way to see how you can run into problems when compiling a dynamically linked executable on one platform and running it on another one! And to learn about the fact that ELF files have an interpreter!
I’ve run into this “file not found” error a couple of times, and it feels kind of mind bending because it initially seems impossible (BUT THE FILE IS THERE!!! I SEE IT!!!). I hope this helps someone be less confused if you run into it!
via: https://jvns.ca/blog/2021/11/17/debugging-a-weird--file-not-found--error/
作者:Julia Evans 选题:lujun9972 译者:译者ID 校对:校对者ID