chroot "no such file or directory" prints wrong missing file-CodePudding

I know similar questions were asked here a billion times, but different to those I got my system working. I just get the wrong error-message when I break it (and I want to debug another issue using these, so it's critical they work).

Starting with the working system:

$ tree
.
├── bin
│   └── bash
├── lib
│   ├── libc.so.6
│   ├── libdl.so.2
│   └── libtinfo.so.6
└── lib64
    └── ld-linux-x86-64.so.2
$ sudo chroot . /bin/bash
bash-5.0#

As we'd expect everything to run the bash is there and the bash runs.

Now when I remove anything inside the lib folder, I get an error telling me the library is missing:

$ rm -f lib/libdl.so.2
$ sudo chroot . /bin/bash
/bin/bash: error while loading shared libraries: libdl.so.2: cannot open shared object file: No such file or directory

Also as expected. However, when I remove ld-linux-x86-64.so.2 inside the lib64 folder:

$ rm -f lib64/ld-linux-x86-64.so.2
$ sudo chroot . /bin/bash
chroot: failed to run command ‘/bin/bash’: No such file or directory

It's telling me /bin/bash is missing. The message is identical to when I actually delete it.

$ rm -f bin/bash
$ sudo chroot . /bin/bash
chroot: failed to run command ‘/bin/bash’: No such file or directory

So for some reason it thinks the bash is missing when actually, what I assume is the dynamic linker, is missing. I assume this is because it's using this linker to load the elf in the first place, but that doesn't make the message more correct.

I even checked when I actually prevent bash it from finding ld-linux-x86-64.so.2 by running it in qemu on a different system I get the correct error message:

<some arm system>$ qemu-x86_64 -L /tmp/nowhere bin/bash
/lib64/ld-linux-x86-64.so.2: No such file or directory

Is this a bug? Is there some option to tell chroot to not behave like this and print the actually missing file? Is there some magic in this file? What is going on here?

TLDR: Why does chroot tell me the executable is missing when actually lib64/ld-linux-x86-64.so.2 is?

CodePudding user response：

Why does chroot tell me the executable is missing when actually lib64/ld-linux-x86-64.so.2 is?

assuming you are using chroot program from GNU coreutils, we can look at the code to understand what is going on (hoping the "magic" will go away). here a github mirror of chroot.c.

if we search for the string in the error message failed to run command, we immediately find the (only) line of code that prints it:

  /* Execute the given command.  */
  execvp (argv[0], argv);

  int exit_status = errno == ENOENT ? EXIT_ENOENT : EXIT_CANNOT_INVOKE;
  error (0, errno, _("failed to run command %s"), quote (argv[0]));
  return exit_status;
}

as you can see, it's printed after the execvp() system call. execvp() is (one variant of) the system call that allows to execute a program (/bin/bash in your case). execvp() does not return if the execution of the program was successful because:

The exec() family of functions replaces the current process image with a new process image.

it returns only in case of error and sets errno appropriately.

the code then inspects errno to decide the exit status:

int exit_status = errno == ENOENT ? EXIT_ENOENT : EXIT_CANNOT_INVOKE;

and finally prints the error:

error (0, errno, _("failed to run command %s"), quote (argv[0]));

as you can see, argv[0] is always printed after failed to run command (/bin/bash in your case i.e. the program it tried to execute in a chroot environment).

errno is the error number "returned" by execvp() and determines what gets printed after (error() is defined as follows on my system):

/* Print a message with `fprintf (stderr, FORMAT, ...)';
   if ERRNUM is nonzero, follow it with ": " and strerror (ERRNUM).
   If STATUS is nonzero, terminate the program with `exit (STATUS)'.  */

extern void error (int __status, int __errnum, const char *__format, ...)

No such file or directory is error number ENOENT and is "returned" by execvp() and friends when:

ENOENT The file pathname or a script or ELF interpreter does not exist.

("ELF interpreter" is a synonymous for dynamic linker I guess)

chroot can't actually know what really went wrong, it can merely report what execvp() put in errno and I think this is the reason why the error is vague and a bit misleading.

CodePudding user response：

Why does chroot tell me the executable is missing when actually lib64/ld-linux-x86-64.so.2 is?

Executables are not run by themselves.

It's similar to shell scripts - you have #!/bin/sh and the file has executable permission, so you do ./file and it really runs /bin/sh ./file.

When you do ./file and file is ELF file with PT_INTERP, the file from PT_INTERP header is taken and is run. So what really happens, you are really running /lib64/ld-linux-x86-64.so.2 ./file for (almost) every possible executable on your system. You can do it on your system - /lib64/ld-linux-x86-64.so.2 /bin/bash will run Bash.

So kernel reads /bin/bash, sees it's an ELF file, reads PT_INTERP, and runs /lib64/ld-linux-x86-64.so.2. Because /lib64/ld-linux-x86-64.so.2 does not exist, kernel sets errno to ENOENT.

Then chroot program inspects errno, and prints a message printf("Failed to run command <that command>: <error description>. chroot is not aware that the kernel couldn't find the program interpreter or the executable itself, it doesn't know. chroot just prints the error description.