We have a perl script that download files from different servers. It is currently taking hours. Where can I find a way to simultaneously download multiple files at the same time. Thank you Sergey.
I am still trying to get the full details from the Dev team but I am looking for some tips.
sub getConfData {
my $mech = WWW::Mechanize->new( autocheck => 1 );
print "USER NAME :- $Inputs::conf_user\n";
print "USER PASS :- $Inputs::conf_pass\n";
$mech->credentials( "$Inputs::conf_user" => "$Inputs::conf_pass" );
logs( $Inputs::logpath, "Opening the URL of confD : $Inputs::conf_url" );
print "Opening the URL of confD : $Inputs::conf_url\n";
$mech->mirror( $Inputs::conf_url, $Inputs::conf_arch );
my $next = Archive::Tar->iter( $Inputs::conf_arch, 1, { filter => qr// } );
my $confdName = $next->()->name;
logs( $Inputs::logpath, "confD Downloading Filename is : $confdName" );
print "confD Downloading Filename is : $confdName\n";
my $tar = Archive::Tar->new();
$tar->read($Inputs::conf_arch) or die logs( $Inputs::errorpath, " Unable to read the TAR file." );
$tar->extract();
my $destination = "$Inputs::download_path" . "$Inputs::confddb";
rmtree($destination);
my $cwd = getcwd();
move_reliable( "$confdName", "$destination" ) or logs( $Inputs::errorpath, " unable to move folder to $destination." );
logs( $Inputs::logpath, "Moving confD $confdName to $Inputs::download_path" . "$Inputs::confddb" );
print "Moving confD $confdName to $Inputs::download_path $Inputs::confddb\n";
if ($@) {
my $message = "Failed to pull confd data and write it to $Inputs::download_path :: $@";
send_mail( $Inputs::from, $Inputs::to, $Inputs::error_subject, $message );
logs( $Inputs::errorpath, " $message" );
die("$message");
}
else {
unlink $Inputs::conf_arch or die logs( $Inputs::errorpath,"COULD NOT UNLINK $Inputs::conf_arch" );
unlink $destination;
}
}
foreach my $server (@productList) {
my $pid;
if ( defined( $pid = fork ) ) {
if ( !$pid ) {
exec("$main_file $server &");
die "Error executing command: $!\n";
}
}
else {
die "Error in fork: $!\n";
}
}
logs( $Inputs::logpath, "Downloading config started at :" . datetimes( 'dtime', 'db' ) );
&getConfData();
logs( $Inputs::logpath, "Downloading config completed at :" . datetimes( 'dtime', 'db' ) );
print "Downloading config Completed at : "`your text`. datetimes( 'dtime', 'normal' ) . "\n";
logs( $Inputs::logpath, "Downloading config completed at :" . datetimes( 'dtime', 'db' ) );
CodePudding user response:
Use Mojo::UserAgent with the methods that take Promises. See You Promised to Call for an extended example.
You can also go inside-out and use some other concurrency things in Perl, but since Mojolicious has it built-in, I go with that.
Or, you can go really fancy and use Minion. Each link gets inserted into a job queue. That way, you can easily mark which ones you need to retry and so on.
CodePudding user response:
If this is always running on a gnu/linux like system I would first try to delegate this to xargs with -P3
for max three concurrent parallel sessions and -n 10
to batch up to 10 url's in each session. Using wget have the benefits of all those options (see man wget
). For instance -N
to skip download if the server file and local file hasn't changed with the same timestamps.
To see what I mean try this example dry run:
perl -E 'open my $FH, "| xargs -P3 -n10 echo wget -N"; print $FH "url$_\n" for 1..100'
Remove echo
and adapt the print $FH ...
part when it's time to get serious.