Perl: Child subprocesses are not being killed when child is being killed-CodePudding

This is being done on windows

I am getting error: The process cannot access the file because it is being used by another process. It seems that even after the child is exiting(exit 0) and the parent is waiting for the child to complete (waitpid($lkpid, 0)),the child's subprocesses are not being killed. Hence, when the next iteration (test case) is running, it is finding the process already running, and hence gives the error message.

Code Snippet ($bashexe and $bePath are defined):

my $MSROO = "/home/abc";
if (my $fpid = fork()) {
  for (my $i=1; $i<=1200; $i  ) {
     sleep 1;
     if (-e "$MSROO/logs/Complete") {
        last;
     }
}
elsif (defined ($fpid)) {
    &runAndMonitor (\@ForRun, "$MSROO/logs/Test.log");  ### @ForRun has the list of test cases
    system("touch $MSROO/logs/Complete");
    exit 0;
}


sub runAndMonitor {
    my @ForRunPerProduct =  @{$_[0]};
    my $logFile = $_[1];
        foreach my $TestVar (@ForRunPerProduct) {
            my $TestVarDirName = $TestVar;
            $TestVarDirName = dirname ($TestVarDirName);
            my $lkpid;
            my $filehandle;       
           if ( !($pid = open( $filehandle, "-|" , " $bashexe -c \" echo abc \; perl.exe reg_script.pl $TestVarDirName -t wint\" >> $logFile "))) {                
                die( "Failed to start process: $!" );
            }
            else {
                print "$pid is pid of shell running: $TestVar\n";   ### Issue (error message above) is coming here after piped open is launched for a new test
                my $taskInfo=`tasklist | grep "$pid"`;
                chomp ($taskInfo);
                print "$taskInfo is taskInfo\n";
            }
            if ($lkpid = fork()) {
                 sleep 1;
                 chomp ($lkpid);
                 LabelToCheck:
                 my $pidExistingOrNotInParent = kill 0, $pid;
                 if ($pidExistingOrNotInParent) {
                     sleep 10;
                     goto LabelToCheck;
                 }
            }
            elsif (defined ($lkpid)) {
                 sleep 12;   
                 my $pidExistingOrNot = kill 0, $pid;
                 if ($pidExistingOrNot){
                      print "$pid still exists\n";
                      my $taskInfoVar1 =`tasklist | grep "$pid"`;
                      chomp ($taskInfoVar1);
                      my $killPID = kill 15, $pid;
                      print "$killPID is the value of PID\n";  ### Here, I am getting output 1 (value of $killPID). Also, I tried with signal 9, and seeing same behavior
                      my $taskInfoVar2 =`tasklist | grep "$pid"`;
                      sleep 10;
                      exit 0;
                 }
            }
             waitpid($lkpid, 0);
             return;
}

Why is it that even after "exit 0 in child" and then "waitpid in parent", child subprocesses are not being killed? What can be done to fully clean child process and its subprocesses?

CodePudding user response：

The exit doesn't touch child processes; it's not meant to. It just exits the process. In order to shut down its child processes as well you'd need to signal them.^†

However, since this is Windows, where fork is merely emulated, here is what perlfork says

Behavior of other Perl features in forked pseudo-processes
...
kill() "kill('KILL', ...)" can be used to terminate a pseudo-process by passing it the ID returned by fork(). The outcome of kill on a pseudo-process is unpredictable and it should not be used except under dire circumstances, because the operating system may not guarantee integrity of the process resources when a running thread is terminated
...
exit() exit() always exits just the executing pseudo-process, after automatically wait()-ing for any outstanding child pseudo-processes. Note that this means that the process as a whole will not exit unless all running pseudo-processes have exited. See below for some limitations with open filehandles.

So don't do kill, while exit behaves nearly opposite to what you need.

But the Windows command TASKKILL can terminate a process and its tree

system("TASKKILL /F /T /PID $pid");

This should terminate a process with $pid and its children processes. (The command can use a process's name instead, TASKKILL /F /T /IM $name, but using names on a busy modern system, with a lot going on, can be tricky.) See taskkill on MS docs.

A more reliable way about this, altogether, is probably to use dedicated modules for Windows process management.

A few other comments

I also notice that you use pipe-open, while perlfork says for that

Forking pipe open() not yet implemented
The open(FOO, "|-") and open(BAR, "-|") constructs are not yet implemented.

So I am confused, does that pipe-open work in your code? But perlfork continues with

This limitation can be easily worked around in new code by creating a pipe explicitly. The following example shows how to write to a forked child: [full code follows]
That C-style loop, for (my $i=1; $i<=1200; $i ), is better written as
```
for my $i (1..1200) { ... }
```
(or foreach, synonyms) A C-style loop is very rarely needed in Perl.

^† A kill with a negative signal (name or number) OR process-id generally terminates the whole tree under the signaled process. This is on Linux.

So one way would be to signal that child from its parent when ready, instead of exit-ing from it. (Then the child would have signal the parent in some way when it's ready.)

Or, the child can send a negative terminate signal to all its direct children process, then exit.

CodePudding user response：

You didn't say which perl you are using. On Windows with Strawberry Perl (and presumably Active State), fork() emulation is ... very problematic, (maybe just "broken") as @zdim mentioned. If you want a longer explanation, see Proc::Background::Win32 - Perl Fork Limitations

Meanwhile, if you use Cygwin's Perl, fork works perfectly. This is because Cygwin does a full emulation of Unix fork() semantics, so anything built against cygwin works just like it does on Unix. The downside is that file paths show up weird, like /cygdrive/c/Program Files. This may or may not trip up code you've already written.

But, you might also have confusion about process trees. Even on Unix, killing a parent process does not kill the child processes. This usually happens for various reasons, but it is not enforced. For example, most child processes have a pipe open to the parent, and when the parent exits that pipe closes and then reading/writing the pipe gives SIGPIPE that kills the child. In other cases, the parent catches SIGTERM and then re-broadcasts that to its children before exiting gracefully. In other cases, monitors like Systemd or Docker create a container inherited by all children of the main process, and when the main process exits the monitor kills off everything else in the container.

Since it looks like you're writing your own task monitor, I'll give some advice from one that I wrote for Windows (and is running along happily years later). I ended up with a design using Proc::Background where the parent starts a task that writes to a file as STDOUT/STDERR. Then it opens that same log file and wakes up every few seconds to try reading more of the log file to see what the task is doing, and check with the Proc::Background object to see if the task exited. When the task exits, it appends the exit code and timestamp to the log file. The monitor has a timeout setting that if the child exceeds, it just un-gracefully runs TerminateProcess. (you could improve on that by leaving STDIN open as a pipe between monitor and worker, and then have the worker check STDIN every now and then, but on Windows that will block, so you have to use PeekNamedPipe, which gets messy)

Meanwhile, the monitor parses any new lines of the log file to read status information and send updates to the database. The other parts of the system can watch the database to see the status of background tasks, including a web admin interface that can also open and read the log file. If the monitor sees that a child has run for too long, it can use TerminateProcess to stop it. Missing from this design is any way for the monitor to know when it's being asked to exit, and clean up, which is a notable deficiency, and one you're probably looking for. However, there actually isn't any way to intercept a TerminateProcess aimed at the parent! Windows does have some Message Queue API stuff where you can set up to receive notifications about termination, but I never chased down the full details there. If you do, please come back and drop a comment for me :-)