Bash builtins also available as separate executables-CodePudding

I wanted to understand bash builtins. Hence the following questions:

When I think of the term builtin I am thinking that the bash executable has a function defined in its symbol table that other parts of the executable can access without actually having to fork. Is this what builtin means?
I also see that some builtins have a separate executable. For instance type [ returns [ is a shell builtin. But then I also see an executable named /usr/bin/[. Is it correct to say that the same code is available through two executables: one through bash program and another through /usr/bin/[?

CodePudding user response：

the bash executable has a function defined in its symbol table

There are builtins that are included inside Bash executable. You can load builtins dynamically from a separate shared library on runtime.

can access without actually having to fork

Yes.

Is it correct to say that the same code is available through two executables: one through bash executable and another through /usr/bin/[?

No, it's a different source code. One is a Bash builtin and the other is a program. It will be a different source code. There is also different behavior in grey areas.

$ printf "%q\n" '*'
\*
$ /bin/printf "%q\n" '*'
'*'

$ time echo 1
1

real    0m0.000s
user    0m0.000s
sys     0m0.000s
$ /bin/time echo 1
1
0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext 0avgdata 2392maxresident)k
64inputs 0outputs (1major 134minor)pagefaults 0swaps

$ [ -v a ]
$ /bin/[ -v a ]
/bin/[: ‘-v’: unary operator expected

CodePudding user response：

Loosely speaking, the program version of the built-ins is used when the shell interpreter is not available or not needed. Let's explain it in more details...

When you run a shell script, the interpreter recognizes the built-ins and will not fork/exec but merely call the function corresponding to the built-in. Even if you call them from an C/C executable through system(), the latter launches a shell first and then makes the spawn shell run the built-in.
Here is an example program, which runs echo message thanks to system() library service:

#include <stdlib.h>

int main(void)
{
  system("echo message");

  return 0;
}

Compile it and run it:

$ gcc msg.c -o msg
$ ./msg
message

Running the latter under strace with the -f option shows the involved processes. The main program is executed:

$ strace -f ./msg
execve("./msg", ["./msg"], 0x7ffef5c99838 /* 58 vars */) = 0

Then, system() triggers a fork() which is actually a clone() system call. The child process#5185 is launched:

clone(child_stack=0x7f7e6d6cbff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD
strace: Process 5185 attached
 <unfinished ...>

The child process executes /bin/sh -c "echo message". The latter shell calls the echo built-in to display the message on the screen (write() system call):

[pid  5185] execve("/bin/sh", ["sh", "-c", "echo message"], 0x7ffdd0fafe28 /* 58 vars */ <unfinished ...>
[...]
[pid  5185] write(1, "message\n", 8message
)    = 8
[...]
    exited with 0

The program version of the built-ins is useful when you need them from a C/C executable without an intermediate shell for the sake of the performances. For instance, when you call them through execv() function.
Here is an example program which does the same thing as the preceding example but with execv() instead of system():

#include <unistd.h>

int main(void)
{
  char *av[3];

  av[0] = "/bin/echo";
  av[1] = "message";
  av[2] = NULL;
  execv(av[0], av);

  return 0;
}

Compile and run it to see that we get the same result:

$ gcc msg1.c -o msg1
$ ./msg1
message

Let's run it under strace to get the details. The output is shorter because no sub-process is involved to execute an intermediate shell. The actual /bin/echo program is executed instead:

$ strace -f ./msg1
execve("./msg1", ["./msg1"], 0x7fffd5b22ec8 /* 58 vars */) = 0
[...]
execve("/bin/echo", ["/bin/echo", "message"], 0x7fff6562fbf8 /* 58 vars */) = 0
[...]
write(1, "message \1\n", 10message 
)            = 10
[...]
exit_group(0)                           = ?
    exited with 0

Of course, if the program is supposed to do additional things, a simple call to execv() is not sufficient as it overwrites itself by the /bin/echo program. A more elaborated program would fork and execute the latter program but without the need to run an intermediate shell:

#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(void)
{
  if (fork() == 0) {

    char *av[3];

    av[0] = "/bin/echo";
    av[1] = "message";
    av[2] = NULL;
    execv(av[0], av);
  }

  wait(NULL);

  // Make additional things before ending

  return 0;
}

Compile and run it under strace to see that the intermediate child process executes the /bin/echo program without the need of an intermediate shell:

$ gcc msg2.c -o msg2
$ ./msg2
message
$ strace -f ./msg2
execve("./msg2", ["./msg2"], 0x7ffc11a5e228 /* 58 vars */) = 0
[...]
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 5703 attached
, child_tidptr=0x7f8e0b6e0810) = 5703
[pid  5703] execve("/bin/echo", ["/bin/echo", "message"], 0x7ffe656a9d08 /* 58 vars */ <unfinished ...>
[...]
[pid  5703] write(1, "message\n", 8message
)    = 8
[...]
[pid  5703]     exited with 0    
<... wait4 resumed>NULL, 0, NULL)       = 5703
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5703, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
exit_group(0)                           = ?
    exited with 0