my $line = "The quick brown fox jumps over the lazy dog.";
while ($line){
$line =~ s/["",]//ig; #[] means to get rid of
#print $line
$line = lc($line); #lc is lowercase
while ($line=~m/\b(\w \s\w )\b/ig){ #[^ ] means any character except spaces and newline #($line=~m/\b(\s\w \s\w )\b/ig)
my $word =$1;
print "$word\n";
$wordcount{$word} = 1;
}
last;
}
close(INPUT);
close(OUTPUT);
Desired out put will be: the quick, quick brown, brown fox, fox jumps.... However, for the code above I am only getting the quick, brown fox, jumps over....
CodePudding user response:
To capture both but not consume the second, so that pairs overlap, a lookahead is useful
use warnings;
use strict;
use feature 'say';
my $string = shift // 'The quick brown fox jumps over the lazy dog.';
while ( $string =~ /(\w )\s(?=(\w ))/g ) {
say "$1 $2";
}
Prints as desired.
CodePudding user response:
You can use
(\w )\s(?=(\w \b))
Regex Explanation
(
Capturing group\w
Match a word
)
Close group\s
Match a space(?=
Lookahead assertion - assert that the following regex matches(
Capturing group\w \b
Match a word
)
Close group
)
Close lookahead
See regex demo
Perl Example
my $line = "The quick brown fox jumps over the lazy dog.";
while ($line =~ /(\w )\s(?=(\w \b))/g) {
print("$1 $2\n");
}
Output
The quick
quick brown
brown fox
fox jumps
jumps over
over the
the lazy
lazy dog