Is there any way to release a named semaphore automatically when the program crashed or killed in Li-CodePudding

When a program gets killed or crashed, no cleanup function can be called. Is there any way to release the named semaphore in this situation in Linux? Seems like POSIX semaphore doesn't solve this problem.

CodePudding user response：

A possible solution is to create a file lock or socket for each process. Check these filelocks or sockets to check if any other consumer has been down, then recycle their semaphore. It works fine, but I'm curious about the performance.

CodePudding user response：

Unlike mutexes and rwlocks, named semaphores do not need to be "cleaned up" by the same process, so we can use a dedicated child process to monitor the parent process, and clean up the semaphore when necessary, even if the parent process dies unexpectedly.

For general background, see man 7 sem_overview, especially section Named semaphores.

Essentially, immediately after creating or opening the named semaphore, the process will create a pipe (see pipe()) and fork() a child process. The child process will reset signal handlers if necessary, redirect standard output and error to /dev/null, the read end of the pipe to standard input, and close all other file descriptors. It will then do a blocking read() from the standard input. If it returns 0, indicating end of input, the child process will clean up the semaphore and exit. If the child process reads even a single byte from the standard input, it will simply immediately exit. (If an error occurs, the correct action depends on your needs; but options range from ignoring to signaling the parent using one of the POSIX realtime signals to exiting with or without named semaphore cleanup.)

The parent process will close the read end of the pipe, but keep the write end handy. If the parent process crashes or dies unexpectedly, the kernel closes the write end, which causes the child end to detect the end-of-input condition, and do the named semaphore cleanup. If the parent process no longer needs the child, it can simply write() a single byte to the pipe, which causes the child process to exit, and reap the child process using waitpid().

In the case where the parent process dies and the child process does the cleanup and then exits, it will become a zombie for a short time (because its parent no longer exists to reap it), but that is okay: the init process (PID 1) will reap the process very quickly. (It is designed to do this, it is completely normal.)

In general, you want to open/create the named semaphore and fork the guardian child process as early as possible in your program, so that the child process can be as simple as possible. For example, if you create sockets, or have open files, while forking the child, the child process will inherit copies of the descriptions. Because the child process does not exec a separate binary, just runs a dedicated function, even O_CLOEXEC/FD_CLOEXEC file descriptor flags will not help here.