Home > Net >  How does Perl regexp anchor $ actually handle a trailing newline?
How does Perl regexp anchor $ actually handle a trailing newline?

Time:09-17

I recently discovered some unexpected behaviour for the end-of-string anchor $ in a Perl Regular Expression (Perl 5.26.1 x86_64 on OpenSuse 15.2).

Supposedly, the $ refers to the end of the string, not the end of a line as it does in grep(1). Hence an explicit \n at the end of a string should have to be matched explicitly. However, the following (complete) program:

my @strings = ( 
  "hello world",
  "hello world\n",
  "hello world\t"
);
my $i = 0;
foreach (@strings) {
  $i  ;
  print "$i: >>$_<<\n" if /d$/;
}

produces this output:

1: >>hello world<<
2: >>hello world
<<

i.e., the /d$/ matches not only the first of the three strings but also the second with its trailing newline. On the other hand, as expected, the regexp /d\n$/ matches the second string only, and /d\s$/ matches the second and third.

What's going on here?

CodePudding user response:

As stated already, the $ metacharacter indeed matches the end of string, but allowing for a newline so matching before a newline at the end of string as well. Note that it also matches before internal newlines in a multiline string with the /m global modifier

There are also ways to fine tune what exactly is matched, using these assertions

  • \z match only the end of string, even with /m flag, but not before the newline at the end

  • \Z match only the end of string, even with /m flag, and also match before the newline at the end of string. So like $ except that it never matches (before) newlines internal to a multi-line string, not even with /m

These "zero-width" assertions match a position, not characters.

CodePudding user response:

perlre states for the $ metacharacter:

Match the end of the string
(or before newline at the end of the string;

This means that d followed immediately by \n (newline) will match the regex.

  • Related