I'm a novice perl programmer and have inherited an older script that uses hash references that I don't understand. It results in "Can't use an undefined value as a HASH reference at ./make_quar_dbfile.pl line 65."
63 my $bucket = sprintf('x', $i);
64 my $file = sprintf('%s/x.db', $qdir, $i);
65 %{$hashes{$bucket}} ? 1 : next;
66 tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
67 %hash = %{$hashes{$bucket}};
68 untie %hash;
The script reads through a number of gzipd emails to identify the sender/recip/subject/date etc, then writes that info to a DB_File hash.
This script used to work with older versions of perl, but looks like now is no longer compliant.
I'd really like to understand how this works, but I don't fully understand reference/dereference, why it's even necessary here, and the %{$var} notation. All of the references I've studied talk about hash references in terms of "$hash_ref = %author;" not "%hash_ref = %{$author}" for example.
Ideas on how to get this to work with hash references would be greatly appreciated.
#!/usr/bin/perl -w
use DB_File;
use File::Basename qw(basename);
use vars qw($verbose);
use strict;
use warnings;
sub DBG($);
$verbose = shift || 1;
my $qdir = '/var/spool/amavisd/qdb';
my $source_dir = '/var/spool/amavisd/quarantine';
my $uid = getpwnam('amavis');
my $gid = getgrnam('amavis');
my %hashes = ( );
my $me = basename($0);
my $version = '1.9';
my $steps = 100;
my $cnt = 0;
DBG("- Creating initial database files...");
for (my $i = 0; $i < 256; $i ) {
my $file = sprintf('%s/x.db', $qdir, $i);
unlink $file || DBG("Could not unlink $file to empty db: $! \n");
tie (my %hash, "DB_File", $file, O_CREAT, 0600) || die "Can't open db file: $! \n";
untie %hash;
chown($uid, $gid, $file) || die "Unable to set attributes on file: $! \n";
}
DBG("done\n");
opendir SOURCEDIR, $source_dir || die "Cannot open $source_dir: $! \n";
DBG("- Building hashes... ");
foreach my $f (sort readdir SOURCEDIR) {
next if ($f eq "." || $f eq "..");
if ($f =~ m/^(spam|virus)\-([^\-] )\-([^\-] )(\.gz)?/) {
my $type = $1;
my $key = $3;
my $bucket = substr($key, 0, 2);
my $d = $2;
my $subj = '';
my $to = '';
my $from = '';
my $size = '';
my $score = '0.0';
if (($cnt % $steps) == 0) { DBG(sprintf("\e[8D%-8d", $cnt)); } $cnt ;
if ($f =~ /\.gz$/ && open IN, "zcat $source_dir/$f |") {
while(<IN>) {
last if ($_ eq "\n");
$subj = $1 if (/^Subject:\s*(.*)$/);
$to = $1 if (/^To:\s*(.*)$/);
$from = $1 if (/^From:\s*(.*)$/);
$score = $1 if (/score=(\d{1,3}\.\d)/);
}
close IN;
$to =~ s/^.*\<(.*)\>.*$/$1/;
$from =~ s/^.*\<(.*)\>.*$/$1/;
$size = (stat("$source_dir/$f"))[7];
$hashes{$bucket}->{$f} = "$type\t$d\t$size\t$from\t$to\t$subj\t$score";
}
}
}
closedir SOURCEDIR;
DBG("...done\n\n- Populating database files...");
for (my $i = 0; $i < 256; $i ) {
my $bucket = sprintf('x', $i);
my $file = sprintf('%s/x.db', $qdir, $i);
%{$hashes{$bucket}} ? 1 : next;
tie (my %hash, 'DB_File', $file, O_RDWR, 0600) || die "Can't open db file: $! \n ";
%hash = %{$hashes{$bucket}};
untie %hash;
}
exit(0);
sub DBG($) { my $msg = shift; print $msg if ($verbose); }
CodePudding user response:
You need to understand references first, this is a kind of how-to :
#!/usr/bin/perl
use strict; use warnings;
use feature qw/say/;
use Data::Dumper;
my $var = {}; # I create a HASH ref explicitly
say "I created a HASH ref explicitly:";
say ref($var);
say "Now, let's add any type of content:";
say "Adding a ARRAY:";
push @{ $var->{arr} }, (0..5);
say Dumper $var;
say "Now, I add a new HASH";
$var->{new_hash} = {
foo => "value",
bar => "other"
};
say Dumper $var;
say 'To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve';
say "to retrieve a HASH ref, we need to dereference with %:";
while (my ($key, $value) = each %{ $var->{new_hash} }) {
say "key=$key value=$value";
}
say "To retrieve the ARRAY ref:";
say join "\n", @{ $var->{arr} };
Output
I created a HASH ref explicitely:
HASH
Now, let's add any type of content:
Adding a ARRAY:
$VAR1 = {
'arr' => [
0,
1,
2,
3,
4,
5
]
};
Now, I add a new HASH
$VAR1 = {
'new_hash' => {
'foo' => 'value',
'bar' => 'other'
},
'arr' => [
0,
1,
2,
3,
4,
5
]
};
To access the data in $var without Data::Dumper, we need to dereference what we want to retrieve
to retrieve a HASH ref, we need to dereference with %:
key=foo value=value
key=bar value=other
To retrieve the ARRAY ref:
0
1
2
3
4
5
Now with your code, instead of
%{$hashes{$bucket}} ? 1 : next;
You should test the HASH ref first, because Perl say it's undefined, let's debug a bit:
use Data::Dumper;
print Dumper $hashes;
print "bucket=$bucket\n";
if (defined $hashes{$bucket}) {
print "Defined array\n";
}
else {
print "NOT defined array\n";
}
CodePudding user response:
What is $hash{$key}
? A value associated with the (value of) $key
, which must be a scalar. So we get the $value
out of my %hash = ( $key => $value )
.
That's a string, or a number. Or a filehandle. Or, an array reference, or a hash reference. (Or an object perhaps, normally a blessed hash reference.) They are all scalars, single-valued things, and as such are a legitimate value in a hash.
The syntax %{ ... }
de-references a hash reference† so judged by their code %{ $hashes{$bucket} }
they expect there to be a hash reference. So the error says that there is actually nothing in %hashes
for that value of a would-be key ($bucket
), so it cannot even "de-reference" it. There is either no key with the value of $bucket
at that point in the loop, or there is such a key but it has never been assigned anything.
So go debug it. Add printing statements through the loops so you can see what values are there and what they are, and which ones aren't even as they are assumed to be. Hard to tell what fails without running that program.
Next, the line %{$hashes{$bucket}} ? 1 : next;
is a little silly. The condition of the ternary operator evaluates to a boolean, "true" (not undefined, not 0
, not empty string ''
) or false. So it tests whether $hashes{$bucket}
has a hashref with at least some keys, and if it does then it returns 1
; so, the for
loop continues. Otherwise it skips to the next iteration.
Well, then skip to next if there is not a (non-empty) hashref there:
next if not defined $hashes{$bucket} or not %{ $hashes{$bucket} };
Note how we first test whether there is such a key, and only then attempt to derefence it.
† Whatever expression may be inside the curlies must evaluate to a hash reference. (If it's else, like a number or a string, the code would still exit with an error but with a different one.)
So, in this code, the hash %hashes
must have a key that is the value of $bucket
at that point, and the value for that key must be a hash reference. Then, the ternary operator tests whether the hash obtained from that hash reference has any keys.