I recently discovered some unexpected behaviour for the end-of-string anchor $
in a Perl Regular Expression (Perl 5.26.1 x86_64 on OpenSuse 15.2).
Supposedly, the $
refers to the end of the string, not the end of a line as it does in grep(1). Hence an explicit \n
at the end of a string should have to be matched explicitly. However, the following (complete) program:
my @strings = (
"hello world",
"hello world\n",
"hello world\t"
);
my $i = 0;
foreach (@strings) {
$i ;
print "$i: >>$_<<\n" if /d$/;
}
produces this output:
1: >>hello world<<
2: >>hello world
<<
i.e., the /d$/
matches not only the first of the three strings but also the second with its trailing newline. On the other hand, as expected, the regexp /d\n$/
matches the second string only, and /d\s$/
matches the second and third.
What's going on here?
CodePudding user response:
As stated already, the $ metacharacter indeed matches the end of string, but allowing for a newline so matching before a newline at the end of string as well. Note that it also matches before internal newlines in a multiline string with the /m
global modifier
There are also ways to fine tune what exactly is matched, using these assertions
\z
match only the end of string, even with/m
flag, but not before the newline at the end\Z
match only the end of string, even with/m
flag, and also match before the newline at the end of string. So like$
except that it never matches (before) newlines internal to a multi-line string, not even with/m
These "zero-width" assertions match a position, not characters.
CodePudding user response:
perlre states for the $
metacharacter:
Match the end of the string
(or before newline at the end of the string;
This means that d
followed immediately by \n
(newline) will match the regex.