I have a little Perl script which includes a substring search as follows.
#!/usr/bin/perl
use strict;
use warnings;
my $line = "this && is || a test if && ||";
my $nb_if = findSymbols($line, "if ");
my $nb_and = findSymbols($line, "&&");
my $nb_or = findSymbols($line, "||");
print "\nThe result for this func is $nb_if=if , $nb_and=and, $nb_or=or\n";
sub findSymbols {
my $n = () = ($_[0] =~ m/$_[1]/g);
return $n;
}
It should return:
The result for this func is 1=if , 2=and, 2=or
but, instead it returns:
The result for this func is 1=if , 2=and, 30=or
I don't understand what's wrong with my code.
CodePudding user response:
Use quotemeta to escape the special meaning of the regular expression containing ||
(and any other characters which you pass to the function):
sub findSymbols {
my $pat = quotemeta $_[1];
my $n = () = ($_[0] =~ m/$pat/g);
return $n;
}
CodePudding user response:
The pipe character (|
) has a special meaning in regular expressions. It means "or" (matching either the thing on its left or the thing on its right). So having a regex that consists of just two pipes is interpreted as meaning "match an empty string or an empty string or an empty string" - and that matches everywhere in your string (30 times!)
So you need to stop the pipe being interpreted as a special character and let it just represent an actual pipe character. Here are three ways to do that:
Escape the pipes with backslashes when you're creating the string that you pass to
findSymbols()
.# Note: I've also changed "..." to '...' # to avoid having to double-escape my $nb_or = findSymbols($line, '\|\|');
Use
quotemeta()
to automatically escape problematic characters in any string passed tofindSymbols()
.my $escaped_regex = quotemeta($_[0]); my $n = () = ($_[0] =~ m/$escaped_regex/g);
Use
\Q...\E
to automatically escape any problematic characters used in your regex.# Note: In this case, the \E isn't actually needed # as it's at the end of the regex. my $n = () = ($_[0] =~ m/\Q$_[0]\E/g);
For more detailed information on using regular expressions in Perl, see perlretut and perlre.
CodePudding user response:
|
is the alternation operator in the regular expression used by m//
. You need to escape each |
with a backslash to match literal |
s.
my $nb_or = findSymbols($line, "\\|\\|"); # or '\|\|`
(but using quotemeta
as suggested by @toolic is a much better idea, as it frees your caller from having to worry about details that should be part of the abstraction provided by findSymbols
.)