The waiting works fine with pidfd_open
and poll
.
The problem I’m facing, after the process quits, apparently the poll()
API removes the information about the now dead process, so the waitid
with P_PIDFD
argument fails at once saying code 22 “Invalid argument”
I don’t think I can afford launching a thread for every child process to sleep on the blocking waitpid
, I have multiple processes, and another handles which aren’t processes I need to poll efficiently.
Any workarounds?
If it matters, I only need to support Linux 5.13.12 and newer running on ARM64 and ARMv7 CPUs.
The approximate sequence of kernel calls is following:
fork
- In the child:
setresuid
,setresgid
,execvpe
- In the new child:
printf
,sleep
,_exit
- Meanwhile in the parent:
pidfd_open
,poll
, once completedwaitid
withP_PIDFD
first argument.
Expected result: waitid
should give me the exit code of the child.
Actual result: it does nothing and sets errno
to EINVAL
CodePudding user response:
There is one crucial bit. From man waitid
:
Applications shall specify at least one of the flags WEXITED, WSTOPPED, or WCONTINUED to be OR'ed in with the options argument.
I was passing was WNOHANG
And you want to pass WNOHAND | WEXITED
;)
CodePudding user response:
You can use a single reaper thread, looping on waitpid(-1, &status, 0). Whenever it reaps a child process, it looks it up in the set of current child processes, handles possible notifications (semaphore or callback), and stores the exit status.
There is one notable situation that needs special consideration: the child process may exit before fork() returns in the parent process. This means it is possible for the reaper to see a child process exiting before the code that did the fork() manages to register the child process ID in any data structure. Thus, both the reaper and the fork() registering functions must be ready to look up or create the record in the data store keeping track of child processes; including calling the callback or posting the semaphore. It is not complicated at all, but unless you are used to thinking in asynchronous terms, it is easy to miss these corner cases.
Because wait(...)/waitpid(-1,...) returns immediately when there are no child processes to wait for (with -1 and errno set to ECHILD
), the reaper thread should probably wait on a condition variable when there are no child processes to wait for, with the code that registers the child process ID signaling on that condition variable to minimize resource use in the no-child-processes case. (Also, do remember to minimize the reaper thread stack size, as it is unreasonably large (order of 8 MiB) by default, and wastes resources. I often use 2*PTHREAD_STACK_MIN, myself.)