Home > OS >  How do I sort the frequency of repetition ($n) from high to low in perl
How do I sort the frequency of repetition ($n) from high to low in perl

Time:08-25

I have this code. This code is functioning well to find the common lines between multiple files. It was just, I dint have any idea how to sort the output from the highest repetition to the lowest. Instead of 5,3,2,6,4,5,6 I want the files to be sorted out as 6,6,5,5,4,3,2

The Output.txt

For line --> five
This line occurs 5 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt
For line --> three
This line occurs 3 times in the following files: - 
a.txt,
b.txt,
c.txt
For line --> two
This line occurs 2 times in the following files: - 
a.txt,
b.txt
For line --> eight
This line occurs 6 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt,
f.txt
For line --> four 
This line occurs 4 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt
For line --> six
This line occurs 5 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt
For line --> seven
This line occurs 6 times in the following files: - 
a.txt,
b.txt,
c.txt,
d.txt,
e.txt,
f.txt
The total common line between files are 7

The Script files (perl)

#!/usr/bin/perl -w
my %hash; 
my $file;
my $fh;
my $count;

for $file (@ARGV) {
    open ($fh, $file) or die "$file: $!\n";
    while(<$fh>) {
        push @{$hash{ $_}}, $file;
    } 
}
for (keys %hash) {
    $n = @{$hash{$_}};
    if(@{$hash{$_}} > 1) {
        $count   ;
        print "\n For line --> $_\n";
        print "This line occurs $n times in the following files: - \n", join(",\n", @{$hash{$_}}), "\n\n";
    }
}
print "The total common line between files are $count\n";  
exit 0;

CodePudding user response:

You can use the following:

sort { @{ $hash{$b} } <=> @{ $hash{$a} } keys %hash

You can also use the phenomenal Sort-Key distribution.

use Sort::Key qw( rukeysort );

rukeysort { 0 @{ $hash{$_} } } keys %hash

Using a Schwartzian Transform was suggested. I don't think that's a good solution here.

Without testing, it's unclear if a Schwartzian Transform would actually improve performance here, what with all the extra call blocks and memory allocations. It's quite possible that it makes the program both more complex and slower.

In fact, it's unclear if using ST is ever a good solution. If it's worthwhile to use a ST, you're better off using Sort::Key if you can. It's both simpler and faster.

CodePudding user response:

You have to sort the list of keys instead of using the arbitrary order that keys returns. A common way in perl to efficently do so is to use a Schwartzian Transform:

for (map  { $_->[0] }
     sort { $b->[1] <=> $a->[1] }
     map  { [ $_, scalar @{$hash{$_}} ] }
     keys %hash) {
    # ...
}
  • Related