Home > database >  Is the use of "||" in a substring search prohibited?
Is the use of "||" in a substring search prohibited?

Time:10-13

I have a little Perl script which includes a substring search as follows.

#!/usr/bin/perl
use strict;
use warnings;

my $line = "this && is || a test if && ||";

my $nb_if = findSymbols($line, "if ");
my $nb_and = findSymbols($line, "&&");
my $nb_or = findSymbols($line, "||");

print "\nThe result for this func is $nb_if=if , $nb_and=and, $nb_or=or\n";

sub findSymbols {
    my $n = () = ($_[0] =~ m/$_[1]/g);
    return $n;
}

It should return:

The result for this func is 1=if , 2=and, 2=or

but, instead it returns:

The result for this func is 1=if , 2=and, 30=or

I don't understand what's wrong with my code.

CodePudding user response:

Use quotemeta to escape the special meaning of the regular expression containing || (and any other characters which you pass to the function):

sub findSymbols {
    my $pat = quotemeta $_[1];
    my $n = () = ($_[0] =~ m/$pat/g);
    return $n;
}

CodePudding user response:

The pipe character (|) has a special meaning in regular expressions. It means "or" (matching either the thing on its left or the thing on its right). So having a regex that consists of just two pipes is interpreted as meaning "match an empty string or an empty string or an empty string" - and that matches everywhere in your string (30 times!)

So you need to stop the pipe being interpreted as a special character and let it just represent an actual pipe character. Here are three ways to do that:

  1. Escape the pipes with backslashes when you're creating the string that you pass to findSymbols().

    # Note: I've also changed "..." to '...'
    # to avoid having to double-escape
    my $nb_or = findSymbols($line, '\|\|');
    
  2. Use quotemeta() to automatically escape problematic characters in any string passed to findSymbols().

    my $escaped_regex = quotemeta($_[0]);
    my $n = () = ($_[0] =~ m/$escaped_regex/g);
    
  3. Use \Q...\E to automatically escape any problematic characters used in your regex.

    # Note: In this case, the \E isn't actually needed
    # as it's at the end of the regex.
    my $n = () = ($_[0] =~ m/\Q$_[0]\E/g);
    

For more detailed information on using regular expressions in Perl, see perlretut and perlre.

CodePudding user response:

| is the alternation operator in the regular expression used by m//. You need to escape each | with a backslash to match literal |s.

my $nb_or = findSymbols($line, "\\|\\|");  # or '\|\|`

(but using quotemeta as suggested by @toolic is a much better idea, as it frees your caller from having to worry about details that should be part of the abstraction provided by findSymbols.)

  •  Tags:  
  • perl
  • Related