A valid string should either consist of Cyrillic characters or Latin characters only.
I created a working solution with 2 regexps. But when I try to unite them into 1, it fails:
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use v5.14;
use open ':std', ':encoding(UTF-8)';
my @data = (
# rus - ok
"абвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
# space
"а бвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
# rus - latin
"аtбвгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
# digit
"аб2вгдеёжзийклмнопрстуфхцчшщьыъэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЬЫЪЭЮЯ",
# latin - ok
"abcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
# space
"a bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
# underscore
"a_bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ",
# digit
"a2bcdefghejklmnopqrstuvwxyzABCDEFGHEJKLMNOPQRSTUVWXYZ"
);
foreach(@data) {
if ($_ =~ /^[а-яё] $/i or $_ =~ /^[a-z] $/i) {
print "ok\n";
}
else {
print "bad\n";
}
}
print "-------\n";
foreach(@data) {
if ($_ =~ /^(:?[а-яё] )|(:?[a-z] )$/i) {
print "ok\n";
}
else {
print "bad\n";
}
}
Output:
ok
bad
bad
bad
ok
bad
bad
bad
-------
ok
ok
ok
ok
ok
ok
ok
ok
For some reason the second regexp always succeeds.
CodePudding user response:
In your regex,
:?
- matches an optional:
while you wanted to define a non-capturing group,(?:...)
^(?:a )|(?:b )$
- matches eithera
s at the start of the string ORb
s at the end of the string.
You should use
/^(?:[а-яё] |[a-z] )$/i
See the regex demo. Details:
^
- start of string(?:
- start of a non-capturing group[а-яё]
- one or more Russian letters|
- or[a-z]
- one or more ASCII letters
)
- end of the non-capturing group$
- end of string.
NOTE: Starting from Perl 5.22, you may use the n
modifier to make capturing groups behave as non-capturing, /^([а-яё] |[a-z] )$/ni
. So, there could be no risk of mixing ?:
and :?
.
Check the core version with use v5.22.0;
in this case.