Home > Software design >  Is there a way to match all adjacent words in a sentence?
Is there a way to match all adjacent words in a sentence?

Time:06-11

my $line = "The quick brown fox jumps over the lazy dog.";

while ($line){
    $line =~ s/["",]//ig; #[] means to get rid of 
    #print $line
    $line = lc($line); #lc is lowercase
        while ($line=~m/\b(\w \s\w )\b/ig){ #[^ ] means any character except spaces and newline #($line=~m/\b(\s\w \s\w )\b/ig)
        my $word =$1;
        print "$word\n";
        $wordcount{$word}  = 1;
         
    }
last;

}
close(INPUT);
close(OUTPUT);

Desired out put will be: the quick, quick brown, brown fox, fox jumps.... However, for the code above I am only getting the quick, brown fox, jumps over....

CodePudding user response:

To capture both but not consume the second, so that pairs overlap, a lookahead is useful

use warnings;
use strict;
use feature 'say';

my $string = shift // 'The quick brown fox jumps over the lazy dog.';
 
while ( $string =~ /(\w )\s(?=(\w ))/g ) { 
   say "$1 $2";
}

Prints as desired.

CodePudding user response:

You can use

(\w )\s(?=(\w \b))

Regex Explanation

  • ( Capturing group
    • \w Match a word
  • ) Close group
  • \s Match a space
  • (?= Lookahead assertion - assert that the following regex matches
    • ( Capturing group
      • \w \b Match a word
    • ) Close group
  • ) Close lookahead

See regex demo

Perl Example

my $line = "The quick brown fox jumps over the lazy dog.";

while ($line =~ /(\w )\s(?=(\w \b))/g) {
    print("$1 $2\n");
}

Output

The quick
quick brown
brown fox
fox jumps
jumps over
over the
the lazy
lazy dog
  • Related