Home > Blockchain >  download files simultaneously from an URL
download files simultaneously from an URL

Time:12-15

We have a perl script that download files from different servers. It is currently taking hours. Where can I find a way to simultaneously download multiple files at the same time. Thank you Sergey.

I am still trying to get the full details from the Dev team but I am looking for some tips.

sub getConfData {
    my $mech = WWW::Mechanize->new( autocheck => 1 );

    print "USER NAME :- $Inputs::conf_user\n";
    print "USER PASS :- $Inputs::conf_pass\n";

    $mech->credentials( "$Inputs::conf_user" => "$Inputs::conf_pass" );
    logs( $Inputs::logpath, "Opening the URL of confD : $Inputs::conf_url" );
    print "Opening the URL of confD : $Inputs::conf_url\n";
    
    $mech->mirror( $Inputs::conf_url, $Inputs::conf_arch );

    my $next = Archive::Tar->iter( $Inputs::conf_arch, 1, { filter => qr// } );
    my $confdName = $next->()->name;
    logs( $Inputs::logpath, "confD Downloading Filename is  : $confdName" );
    print "confD Downloading Filename is  : $confdName\n";
    my $tar = Archive::Tar->new();
    $tar->read($Inputs::conf_arch) or die logs( $Inputs::errorpath, " Unable to read the TAR file." );
    $tar->extract();

    my $destination = "$Inputs::download_path" . "$Inputs::confddb";
    rmtree($destination);
    my $cwd = getcwd();
    move_reliable( "$confdName", "$destination" ) or logs( $Inputs::errorpath, " unable to move folder to $destination." );
    logs( $Inputs::logpath, "Moving confD $confdName to $Inputs::download_path" . "$Inputs::confddb" );
    print "Moving confD $confdName to $Inputs::download_path $Inputs::confddb\n";

    if ($@) {
        my $message = "Failed to pull confd data and write it to $Inputs::download_path :: $@";
        send_mail( $Inputs::from, $Inputs::to, $Inputs::error_subject, $message );
        logs( $Inputs::errorpath, " $message" );
        die("$message");
    }
    else {
        unlink $Inputs::conf_arch or die logs( $Inputs::errorpath,"COULD NOT UNLINK $Inputs::conf_arch" );
        unlink $destination;
       }
   }


 
    foreach my $server (@productList) {
           my $pid;
           if ( defined( $pid = fork ) ) {
               if ( !$pid ) {
                   exec("$main_file $server &");
                   die "Error executing command: $!\n";
               }
           }
           else {
               die "Error in fork: $!\n";
              }
           }
 
       logs( $Inputs::logpath, "Downloading config started at :" . datetimes( 'dtime', 'db' ) );
        &getConfData();
       logs( $Inputs::logpath, "Downloading config completed at :" . datetimes( 'dtime', 'db' ) );

        print "Downloading config Completed at : "`your text`. datetimes( 'dtime', 'normal' ) . "\n";
        logs( $Inputs::logpath, "Downloading config completed at :" . datetimes( 'dtime', 'db' ) );



CodePudding user response:

Use Mojo::UserAgent with the methods that take Promises. See You Promised to Call for an extended example.

You can also go inside-out and use some other concurrency things in Perl, but since Mojolicious has it built-in, I go with that.

Or, you can go really fancy and use Minion. Each link gets inserted into a job queue. That way, you can easily mark which ones you need to retry and so on.

CodePudding user response:

If this is always running on a gnu/linux like system I would first try to delegate this to xargs with -P3 for max three concurrent parallel sessions and -n 10 to batch up to 10 url's in each session. Using wget have the benefits of all those options (see man wget). For instance -N to skip download if the server file and local file hasn't changed with the same timestamps.

To see what I mean try this example dry run:

perl -E 'open my $FH, "| xargs -P3 -n10 echo wget -N"; print $FH "url$_\n" for 1..100'

Remove echo and adapt the print $FH ... part when it's time to get serious.

  • Related