Home > Software engineering >  Perl unite 2 regexps into 1
Perl unite 2 regexps into 1

Time:09-28

A valid string should either consist of Cyrillic characters or Latin characters only.

I created a working solution with 2 regexps. But when I try to unite them into 1, it fails:

#!/usr/bin/perl

use strict;
use warnings;
use utf8;
use v5.14;
use open ':std', ':encoding(UTF-8)';

my @data = (
    # rus - ok
    "абвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # space
    "а бвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # rus - latin
    "аtбвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # digit
    "аб2вгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
    # latin - ok
    "abcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
    # space
    "a bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
    # underscore
    "a_bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
    # digit
    "a2bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ"
);

foreach(@data) {
    if ($_ =~ /^[а-яё] $/i or $_ =~ /^[a-z] $/i) {
        print "ok\n";
    }
    else {
        print "bad\n";
    }
}

print "-------\n";
foreach(@data) {
    if ($_ =~ /^(:?[а-яё] )|(:?[a-z] )$/i) {
        print "ok\n";
    }
    else {
        print "bad\n";
    }
}

Output:

ok
bad
bad
bad
ok
bad
bad
bad
-------
ok
ok
ok
ok
ok
ok
ok
ok

For some reason the second regexp always succeeds.

CodePudding user response:

In your regex,

  • :? - matches an optional : while you wanted to define a non-capturing group, (?:...)
  • ^(?:a )|(?:b )$ - matches either as at the start of the string OR bs at the end of the string.

You should use

/^(?:[а-яё] |[a-z] )$/i

See the regex demo. Details:

  • ^ - start of string
  • (?: - start of a non-capturing group
    • [а-яё] - one or more Russian letters
    • | - or
    • [a-z] - one or more ASCII letters
  • ) - end of the non-capturing group
  • $ - end of string.

NOTE: Starting from Perl 5.22, you may use the n modifier to make capturing groups behave as non-capturing, /^([а-яё] |[a-z] )$/ni. So, there could be no risk of mixing ?: and :?.

Check the core version with use v5.22.0; in this case.

  • Related