Idiomatic way to slice a Perl array by value rather than index-CodePudding

I have a piece of code that extracts a slice from one of two large arrays of sorted integers, representing stopping points in the program's workflow. I'll include one of the two here.

The basic idea is that I'm trying to slice a work range out of this large array as a starting point for this program to work on. $min is pulled from the task object, representing the current progress of the task. $limit is an optional user override, defaulting to -1 (which is ignored).

Currently, I'm using the firstidx function from the List::MoreUtils CPAN module to retrieve the indices for the start and finish, and then I'm using them to slice the @steps array in the usual way. Is there a faster and/or more idiomatic way of doing this? Particularly, is there a good way to do it by directly using the $min and $limit values (with another codepath for the $limit == -1 case)?

Here's the code:

my @steps = (
    0, 1, 5, 10,
    20, 30, 40, 50, 60, 70, 80, 90, 100,
    200, 300, 400, 500, 600, 700, 800, 900, 1000,
    1200, 1400, 1600, 1800, 2000,
    2500, 3000, 3500, 4000, 4500, 5000,
    6000, 7000, 8000, 9000, 10000,
    11000, 12000, 13000, 14000, 15000, 17500, 20000,
    25000, 30000, 35000, 40000, 45000, 50000,
    60000, 70000, 80000, 90000, 100000
);
my $min_index = firstidx { $_ > $min } @steps;
my $max_index;
if ($limit == -1) {
    $max_index = @steps - 1;
} else {
    $max_index = firstidx { $_ >= $limit } @steps;
}
my @steps_todo = @steps[ $min_index .. $max_index ];

CodePudding user response：

You can do this with a simple foreach loop. The flow control keywords control when you start or stop consuming the @steps array.

I changed the no-limit case to be a defined check since that's more efficient.

This is not the most efficient possible algorithm but it's a simple idomatic way of solving it.

use diagnostics; # Very verbose warnings
($min, $max, @out) = (1000, 9999);

@steps = (
    0, 1, 5, 10,
    20, 30, 40, 50, 60, 70, 80, 90, 100,
    200, 300, 400, 500, 600, 700, 800, 900, 1000,
    1200, 1400, 1600, 1800, 2000,
    2500, 3000, 3500, 4000, 4500, 5000,
    6000, 7000, 8000, 9000, 10000,
    11000, 12000, 13000, 14000, 15000, 17500, 20000,
    25000, 30000, 35000, 40000, 45000, 50000,
    60000, 70000, 80000, 90000, 100000
);


foreach (@steps) {
    next if $_ < $min;
    last if defined $max and $_ > $max;
    push @out, $_;
}


print "@out";

## output
## 1000 1200 1400 1600 1800 2000 2500 3000 3500 4000 4500 5000 6000 7000 8000 9000

This is also a possible use case for the flip flop operator, ... It looks like the list range operator but in scalar context it's flip flop. Think of it as like the cmp tristate operator but for arbitrary expressions. It returns a value that defines a range from "this" to "that". Call it a flapper. Look at this sample. See the perlop Range Operator section for more info. The operands can be anything that returns a boolean.

use Data::Dump qw/pp/;
($min, $max) = (1000, 9999);
foreach (@steps) {
    my $x = ($_ >= $min .. $_ < $max);
    printf "%-9s %s\n", $_, pp $x;
}

## Output
0         ""
1         ""
5         ""
10        ""
20        ""
30        ""
40        ""
50        ""
60        ""
70        ""
80        ""
90        ""
100       ""
200       ""
300       ""
400       ""
500       ""
600       ""
700       ""
800       ""
900       ""
1000      "1E0"
1200      "1E0"
1400      "1E0"
1600      "1E0"
1800      "1E0"
2000      "1E0"
2500      "1E0"
3000      "1E0"
3500      "1E0"
4000      "1E0"
4500      "1E0"
5000      "1E0"
6000      "1E0"
7000      "1E0"
8000      "1E0"
9000      "1E0"
10000     1
11000     2
12000     3
13000     4
14000     5
15000     6
17500     7
20000     8
25000     9
30000     10
35000     11
40000     12
45000     13
50000     14
60000     15
70000     16
80000     17
90000     18
100000    19

Notice the ones that return "1E0" because those values trigger a reset in the flip flop. In a real loop that transition point is where it would terminate. You can play with the arguments to pick what set of values get "", "1E0", or the sequence number and act accordingly.

This version gives the sequence number to the wanted value and then starts the "E0" reset values beyond.

($_ >= $min .. $_ > $max);

...
800       ""
900       ""
1000      1
1200      2
...
8000      15
9000      16
10000     "17E0"
11000     "1E0"
12000     "1E0"
...

Here's how you would solve it using flip flop. The advantage of this way is that the lower bound condition is no longer evaluated once it's reached. The major downside is that it's super cryptic and probably won't be understandable to non-Perl programmers. But this is as compact and efficient as it can get.

foreach (@steps) {
    ($_ >= $min .. ($_ > $max and last)) or next;
    push @out, $_;
}

print "@out\n";
## 1000 1200 1400 1600 1800 2000 2500 3000 3500 4000 4500 5000 6000 7000 8000 9000

While the left flapper is false, the whole expression is false and we go to the next item. When the left flapper becomes true, the whole expression is now true and we go on and start checking the right flapper. While the right flapper is false (the whole expression is still true), we go on to save the item in @out. When the right flapper becomes true, we terminate the loop. (This is also where it would reset back to false.)

One more edit. :)

If there is no max, here's a good way to do it.


for (my $i = 0; $i <= $#steps; $i  ) {
    next unless $steps[$i] >= $min;
    @out = @steps[$i .. $#steps];
    # Or, if you don't need the original array anymore you can consume it 
    # and be a bit more efficient and save memory
    # @out = splice @steps, $i;
    last;

}

Great question!

CodePudding user response：

An idiomatic way would be to use grep to select the range. That has the disadvantage that it scans the entire array, which might be a performance point if the array is large and the grep is executed often.

Since the list is sorted, a possibility for performance (but certainly no shorter or simpler) is to use the binary search functions from List::MoreUtils to find the bounds of the range.