Home > Blockchain >  Load zlib compressed data from a file with data offset in Perl
Load zlib compressed data from a file with data offset in Perl

Time:08-29

I want my perl script to load the binary data from a file. However this file can either be directly loaded or needs to be uncompressed (zlib) if the header _ISCOMPRESSED_ is present a the begining of the file.

I've been able to successfully load the uncompressed file and recognise the header:

(open my ($fh), "<", $fileName) or (return 0);
binmode $fh;

my $fileHeader;
sysread $fh, $fileHeader, 14;
if( $fileHeader eq "_ISCOMPRESSED_" ){
  # Here, need to decompress the filestream and update the $fh to point toward uncompressed data 
}
else{
  # Read it from the begining
  sysseek $fh,0,0;
}

# Read the data using the file handle
sysread $fh,$self->{'sig'},4;
sysread $fh,$self->{'version'},4;

I'd like now to decompress with Zlib the chunck of data and update the file handle $fh to dispense uncompressed data.

How should I do and is it possible to do it without writing the uncompressed data on disk?

CodePudding user response:

The decompression modules that come with perl can read from an existing open file handle. The reading will start at the current offset, making it easy to skip the header. The IO::Uncompress::* modules in particular create file handle objects that can be used with normal I/O functions to allow transparent use; after creating it, your code doesn't care if it's a compressed or plain source file. Something like:

#!/usr/bin/env perl
use warnings;
use strict;
# I don't have a zlib-flate to test for sure; I think this is the right module
use IO::Uncompress::Inflate;

my $fileName = "data.compressed";

my $fh;
open my $realfh, "<:raw", $fileName
    or die "Unable to open $fileName: $!\n";
read $realfh, my $header, 14;
if ($header eq "_ISCOMPRESSED_") {
    $fh = IO::Uncompress::Inflate->new($realfh, AutoClose => 1)
        or die "Unable to open decompression stream!\n";
} else {
    seek $realfh, 0, 0;
    $fh = $realfh;
}

read $fh, $self->{'sig'}, 4;
read $fh, $self->{'version'}, 4;
# etc.
close $fh;

If you're doing a lot of small input operations like you seem to, I'd use read over sysread to take advantage of the internal buffering. But the important thing is to be consistent; mixing the two forms on the same file handle will lead to problems with seemingly missing data.

  • Related