I'm trying to learn Perl's grep
better.
I want to grep which keys of a hash are not in an array
my %args = ( fake => 1);
my @defined_args = ('color', 'colors', 'data', 'figheight', 'figwidth', 'filename', 'flip', 'grid', 'labelsize', 'logscale', 'minor_gridlines');
my @bad_args = grep { not grep {$_} @defined_args} keys %args;
where the list of bad args is in @bad_args
The last line is obviously wrong.
I know that I can do the same thing with a hash, but I want to be able to do this with a multi-order grep, i.e. grep on grep.
How can I do this like the following?
my @bad_args = grep { not grep {$_ eq $_} @defined_args} keys %args;
I'm confused because there would be two $_
, which I can't run an equality test on.
CodePudding user response:
First the direct answer -- that block that grep
takes, you can put any code in it. That's the point of the block, and an element passes/not based on the truthiness of the last statement that returns.
my @bad_args = grep {
my $key = $_;
@defined_args == grep { $key ne $_ } @defined_args
} keys %args;
Here we test whether a key is not-equal to array elements, and then test whether it was unequal to all of them, what decides. Another way would be to test whether it is equal to any one element,
not grep { $key eq $_ } @defined_args;
This is all a little convoluted, needing to work with negations.
But these are kinds of common things to do and there are libraries.
To directly improve on the above
use List::Util 1.33 qw(none); # before 1.33 it was in List::MoreUtils
my @bad_args = grep {
my $key = $_;
none { $key eq $_ } @defined_args
} keys %args;
Now the needed "negative" is absorbed in the library's function name, making this far easier to look at. Also, none
will stop once it sees that it failed while grep
always processes all elements so this is also more efficient.
These aren't terribly efficient in comparison with hash-based approaches (complexity O(NM-M2/2) or so) but that is completely irrelevant for small arrays. Use of hashes, mentioned in the question, for existence-related issues is a standard; see for example this post, or the source for methods used in all libraries discussed below (simplest example).
Finally, while the question is about (double) filtering it should be mentioned that we are looking for which elements of a list aren't in another; a "difference" between lists. Then other kinds of libraries come into play. Some examples
Using Set::Scalar
use Set::Scalar;
...
my $keys = Set::Scalar->new(keys %args);
my $good = Set::Scalar->new(@defined_args);
my $keys_not_in_good = $keys->difference($good);
say $keys_not_in_good;
Also note Set::Object in the same camp.
Then there are tools specifically for array comparison, like List::Compare
use List::Compare;
...
my $lc = List::Compare->new('-u', '-a', \@defined_args, [keys %args]);
my @only_in_second = $lc->get_complement();
say "@only_in_second";
Options -u
and -a
showcase some of modules capabilities, to speed things up; they are not necessary. This module has a lot, see docs.
On the other end is the simple Array::Utils.
There is more out there. See for example this page for plenty of ideas.
CodePudding user response:
When you get into these sort of tangles, it's sometimes better to find a different way.
There are two things to think about. If you want a nested use of $_
, you need to protect the outer one somehow. Since you want to use the outer and inner ones in the same expression, one of them needs a different name:
grep {
my $top = $_;
my $count = grep { $top eq $_ } ...;
...
} keys %args;
But, that inner grep is a bit weird. You want to check if something is (or isn't) in a list. That's the job for a hash and exists
:
my %allowed_args = map { $_, 1 } @allowed_args;
my @found_bad_args = grep { ! exists $allowed_args{$_} } keys %args;