I'm new in programming with Perl.
I have a 8GB zip file that contains data files and a metadata file. The goal is read content of metadata file to compare with data filenames in the zip.
Current implementation is using IO::Uncompress::Unzip
, and it take too long (~15min) to read the metadata file ~60KB.
I have create a PoC script using Archive::Zip::MemberRead
, extract information from the same file and the execution is really fast (in seconds).
My concern is there any limitation with using Archive::Zip in my scenario.
CodePudding user response:
@MiguelPrz: in my script, there is step walkthrough to zip file to retrieve member name and size and it quite fast. And next step is read the content of metadata file, by using unzip with specified file name, it very slow. – Le Vu
It isn't necessary to call out to unzip
-- you can use IO::Uncompress::Unzip
to access the metadata file directly.
Here is a quick worked example that will check a zip file for a member called metadata.txt
. If it finds it, it will read the contents into memory & print it.
Start by creating a test zip file that has a member called metadata.txt
.
$ echo abc >metadata.txt
$ zip test.zip metadata.txt
adding: metadata.txt (stored 0%)
Now some code that walks through the zip file & checks for the metadat member.
#!/usr/bin/perl
use strict;
use warnings;
use IO::Uncompress::Unzip qw($UnzipError);
my $zipfile = "test.zip";
my $u = IO::Uncompress::Unzip->new( $zipfile )
or die "Cannot open $zipfile: $UnzipError";
my $status;
for ($status = 1; $status > 0; $status = $u->nextStream())
{
my $name = $u->getHeaderInfo()->{Name};
warn "Processing member $name\n" ;
if ($name eq 'metadata.txt')
{
local $/;
my $data = <$u>;
print "METADATA is [$data]\n";
}
last if $status < 0;
}
die "Error processing $zipfile: $!\n"
if $status < 0 ;
When I run that I get this output
$ perl testzip.pl
Processing member metadata.txt
METADATA is [abc
]
[full disclosure -- I'm the author of IO::Uncompress::Unzip
]