Home > database >  Is there a built in Perl Function for finding duplicate subarrays(exact order) in an array?
Is there a built in Perl Function for finding duplicate subarrays(exact order) in an array?

Time:06-15

Lets say the array is (1,2,3,4,5,6,7,8,9), Another subarray is (2,3,4) Is there a function to check if the subarray pattern(full exact order) exists within array? In this case, it would return any indicator(index) that shows it exists. Also would need to work for duplicates if there are multiple subarrays existing in the array like (4,2,3,4,2,3,4). If it happens to match multiple times for example: Array = (2,3,2,3,2,2,3,2) Sub Array = (2,3,2) Would just return starting index of matches in order: 0,2,5 Or if it removes, would result in (3,2)

Edit: Elements don't have to be num

CodePudding user response:

There's no built-in method, but it's easy to write:

#!/usr/bin/env perl
use warnings;
use strict;
use feature qw/say/;

# Takes two arrayrefs of numbers.
#
# Returns the first index in the first one where the second list appears, or
# -1 if not found.
sub find_sublist(  ) {
  my ($haystack, $needle) = @_;
  my $nlen = @$needle;
  my $hlen = @$haystack;
  return -1 if $hlen == 0 || $nlen == 0;
 HAYSTACK_POS:
  for (my $n = 0; $n <= $hlen - $nlen; $n  ) {
    for (my $m = 0; $m < $nlen; $m  ) {
      if ($haystack->[$n   $m] != $needle->[$m]) {
        next HAYSTACK_POS;
      }
    }
    return $n;
  }
  return -1;
}


# Takes two arrayrefs of numbers.
#
# Returns a list of the starting indexes of the first list
# of every run of the second list. Returns an empty list if
# there are no matches.
sub find_sublists(  ) {
  my ($haystack, $needle) = @_;
  my $nlen = @$needle;
  my $hlen = @$haystack;
  my @positions;
  return @positions if $hlen == 0 || $nlen == 0;
 HAYSTACK_POS:
  for (my $n = 0; $n <= $hlen - $nlen; $n  ) {
    for (my $m = 0; $m < $nlen; $m  ) {
      if ($haystack->[$n   $m] != $needle->[$m]) {
        next HAYSTACK_POS;
      }
    }
    push @positions, $n;
  }
  return @positions;
}

# Takes two arrayrefs of numbers.
#
# Returns a new list that is the first one with every non-overlapping run of
# the second second list removed.
sub remove_sublists(  ) {
  my @haystack = @{$_[0]};
  my $needle = $_[1];
  while ((my $pos = find_sublist @haystack, $needle) != -1) {
    splice @haystack, $pos, @$needle;
  }
  return @haystack;
}

my @list1 = (1,2,3,4,5,6,7,8,9);
my @list2 = (4,2,3,4,2,3,4);
my @list3 = (2,3,2,3,2,2,3,2);
say find_sublist(@list1, [2, 3, 4]);            # Returns 1
say find_sublist([2,9,3,4], [2,3,4]);           # Returns -1
my @positions = find_sublists(@list2, [2,3,4]); # 1,4
say join(",", @positions);
@positions = find_sublists(@list3, [2,3,2]); # 0,2,5
say join(",", @positions);
say join(",", remove_sublists(@list1, [2,3,4])); # 1,5,6,7,8,9
say join(",", remove_sublists(@list3, [2,3,2])); # 3,2

CodePudding user response:

If the inputs are numbers representable by your perl's integers (as shown), you can use

# Indexes
my $pattern = pack "W*", @pattern;
my $array   = pack "W*", @array;
my @indexes;
push @indexes, $-[0] while $array =~ /\Q$pattern/g;
# Removal
my $pattern = pack "W*", @pattern;
my $array   = pack "W*", @array;
$array =~ s/\Q$pattern//g;
@array = unpack "W*", $array;

How it handles overlaps:

           /---\     /---\   Removed
2,3,2 from 2,3,2,3,2,2,3,2 
               \---/         Not removed

Note that this also works if you can map the inputs to numbers.

my ( %map_f, @map_r );
for ( @array, @pattern ) {
   if ( !exists{ $map{ $_ } } ) {
      $map_f{ $_ } = @map_r;
      push @map_r, $_;
   }
}

my $pattern = pack "W*", @map_f{ @pattern };
my $array   = pack "W*", @map_f{ @array   };
$array =~ s/\Q$pattern//g;
@array = @map_r[ unpack "W*", $array ];

It's not the best algorithm, but it should be very fast by moving the work from Perl to the regex engine.

  • Related