How do I run perl scripts in parallel and capture outputs in files?-CodePudding

I need to run my perl tests in parallel and capture STDOUT and STDERR in a separate file for each test file. I'm having no success even in capturing in one file. I've been all over SO and have had no luck. Here is where I started (I'll spare you all the variations). Any help is greatly appreciated. Thanks!

foreach my $file ( @files) {
    next unless $file =~ /\.t$/;
    print "\$file = $file\n";

    $file =~ /^(\w )\.\w /;
    
    my $file_pfx = $1;
    my $this_test_file_name = $file_pfx . '.txt';
    
    system("perl $test_dir\\$file > results\\$test_file_name.txt &") && die "cmd failed: $!\n";

}

CodePudding user response：

Here is a simple example using Parallel::ForkManager to spawn separate processes.

In each process the STDOUT and STDERR streams are redirected, in two ways for a demo: STDOUT to a variable, that can then be passed around as desired (here dumped into a file), and STDERR directly to a file. Or use a library, with an example in a separate code snippet.

The numbers 1..6 represent batches of data that each child will pick from to process. Only three processes are started right away and then as one finishes another one is started in its place.^† (Here they exit nearly immediately, the "jobs" being trivial.)

use warnings;
use strict;
use feature 'say';

use Carp qw(carp)        
use Path::Tiny qw(path); 
use Parallel::ForkManager; 

my $pm = Parallel::ForkManager->new(3); 

foreach my $data (1..6) { 
    my $pid = $pm->start and next;  # start a child process
    proc_in_child($data);           # code that runs in the child process
    $pm->finish;                    # exit it
}
$pm->wait_all_children;             # reap all child processes

say "\nParent $$ done\n";
    
sub proc_in_child {
    my ($data) = @_; 
    say "Process $$ with data $data";  # still shows on terminal

    # Will dump all that was printed to streams to these files
    my (outfile, $errfile) = 
        map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);

    # Redirect streams
    # One way to do it, redirect to a variable (for STDOUT)...  
    open my $fh_stdout, ">", \my $so or carp "Can't open handle to variable: $!";
    my $fh_STDOUT = select $fh_stdout;
    # ...another way to do it, directly to a file (for any stream)
    # (first 'dup' it so it can be restored if needed)
    open my $SAVEERR, ">&STDERR"  or carp "Can't dup STDERR";
    open *STDERR, ">", $errfile or carp "Can't redirect STDERR to $errfile";

    # Prints wind up in a variable (for STDOUT) and a file (for STDERR)
    say  "STDOUT: Child process with pid $$, processing data #$data"; 
    warn "STDERR: Child process with pid $$, processing data #$data"; 

    close $fh_stdout;
    # If needed to restore (not in this example which exits right away)
    select $fh_STDOUT;
    open STDERR, '>&', $SAVEERR  or carp "Can't reopen STDOUT";

    # Dump collected STDOUT to a file (or pass it around, being in a variable)
    path( $outfile )->spew($so);

    return 1
}

While STDOUT is redirected to a variable, STDERR cannot be redirected that way and here it goes directly to a file. See open. However there are ways to capture it in a variable as well.

Then you can use the module's ability to return from child processes to the parent, which can then handle those variables. See for example this post and this post and this post. (There's way more, these are merely the ones I know.) Or indeed just dump them to files, as done here.

Another way is to use modules that can run code and redirect output, like Capture::Tiny

use Capture::Tiny qw(capture);

sub proc_in_child {
    my ($data) = @_; 
    say "Process $$ with data $data";  # on terminal

    # Run code and capture all output
    my ($stdout, $stderr, @results) = capture {
          say  "STDOUT: Child process $$, processing data #$data";
          warn "STDERR: Child process $$, processing data #$data"; 

          # return results perhaps...
          1 .. 4;
    }

    # Do as needed with variables with collected STDOUT and STDERR
    # Return to parent, or dump to file:
    my ($outfile, $errfile) = 
        map { "proc_data-${data}_" . $_ . ".$$.out" } qw(stdout stderr);

    path($outfile) -> spew( $stdout );
    path($errfile) -> spew( $stderr );

    return 1
}

^† This keeps the same number of processes running. Or, one can set it up to wait for the whole batch to finish and then start another batch. For some details of operation see this post

CodePudding user response：

I think, the easiest way is to use shell redirects in your 'system' command. BTW, spawning uncontrolled subprocesses from it with '&' makes me frown.

Here is a simple example of with shell redirects and fork.

#!/usr/bin/perl
use strict;

for my $i (0..2) {
    my $stdoutName = "stdout$i.txt";
    my $stderrName = "stderr$i.txt";
    my $pid = fork();
    if($pid == 0) {
        system("perl mytest.pl 1>$stdoutName 2>$stderrName"); #redirects are here 1> (stdout) and 2> (stderr)
        exit $?;
    }
}