my $text ='<span>by <small itemprop="author">J.K. Rowling</small><span>by <small itemprop="author">J.K. Rowling</small><span>by <small itemprop="author">J.K. Rowling</small>'
if ($text =~ m/<span>by <small itemprop="author">(. ?)<\/small>/ig){
$author = $1;
$authorcount{$author} =1;
}
$authorcounttxt = "authorcount.txt";
open (OUTPUT3, ">$authorcounttxt");
foreach $author (sort { $authorcount{$b} <=> $authorcount{$a} } keys %authorcount){
print OUTPUT3 ("$author\t\t$authorcount{$author}\n");
}
close (OUTPUT3);
The desired output is:
J.K. Rowling 3
However I am only getting:
J.K. Rowling 1
CodePudding user response:
if ($text =~ m/.../ig){ $author = $1; $authorcount{$author} =1;
This is an if statement which means that the inner block while be entered at most once, i.e. if there is a first match. You likely meant to do a while statement to enter the inner block for each match:
while ($text =~ m/.../ig){ $author = $1; $authorcount{$author} =1;
CodePudding user response:
Replace your if
with a while
to iterate through all of the matches of your regex match instead of only the first one:
while ($text =~ m/<span>by <small itemprop="author">(. ?)<\/small>/ig){
$author = $1;
$authorcount{$author} = 1;
}
Also obligatory note: parsing HTML with regexen is fraught with peril. Consider using a module that can properly parse HTML, Mojo::DOM for example.
CodePudding user response:
As already indicated by previous posters the issue hidden in if ( $text =~ /.../gi )
, it evaluates to true
and block executed only once.
You are looking to process match in an array context which can be achieved with for
or while
loop.
Following code snippet demonstrates one of many approaches to the solution.
use strict;
use warnings;
use feature 'say';
my(%authors, $fname, $text, $re);
$fname = 'authorcount.txt';
$text = '<span>by <small itemprop="author">J.K. Rowling</small><span>by <small itemprop="author">J.K. Rowling</small><span>by <small itemprop="author">J.K. Rowling</small>';
$re = qr/<span>by <small itemprop="author">(.*?)<\/small>/;
$authors{$1} for $text =~ /$re/gi;
open my $fh, ">$fname"
or die "Can't open $fname";
say $fh "$_ $authors{$_}" for sort keys %authors;
close $fh;
NOTE: this code will work for your example $text = '...'
, if you intend to process complex HTML
files then Mojo::DOM is a right tool to a problem.