With single-byte chars optional matching works:
~% perl -e 'print ("l" =~ /l?/u)'
1%
~% perl -e 'print ("l" =~ /l?l?/u)'
1%
With unicode (wide-byte) chars optional matching does not work
~% perl -e 'print ("д" =~ /д?/)'
1%
~% perl -e 'print ("д" =~ /д?д?/u)'
~%
How to make it work? I've already added /u
and I've tried use feature 'unicode_strings'
to no avail. I assume perl sees д
as multiple bytes and only applies ?
to the last one.
CodePudding user response:
You have to tell Perl that the source is in UTF-8:
perl -Mutf8 -e 'print "д" =~ /д?д?/u'
See utf8 for details.
CodePudding user response:
By default, Perl expects source code provided to it to be ASCII. (String literals are 8-bit clean, meaning non-ASCII bytes are included as-is.) Using use utf8;
tells it to expect UTF-8 instead.
$ perl -le'print "д" =~ /д?д?/u'
$ perl -le'print "\xD0\xB4" =~ /\xD0\xB4?\xD0\xB4?/u' # Same as previous
$ perl -le'use utf8; print "д" =~ /д?д?/u'
1
$ perl -le'use utf8; print "\x{434}" =~ /\x{434}?\x{434}?/u' # Same as previous
1
$ perl -Mutf8 -le'print "д" =~ /д?д?/u' # -Mutf8 == use utf8;
1