Home > Back-end >  What is the expected behaviour of str.replace(/^/gm, "\t")
What is the expected behaviour of str.replace(/^/gm, "\t")

Time:05-05

I would expect str.replace(/^/gm, "\t") to insert in front of every line a \t. This seems to be the case, but only if using LF endings, when using CRLF endings, something weird happens, it inserts it also between \r\n.

Consider this example code:

var str1 = "First line\nnext Line";
var str2 = "First line\r\nnext Line";

function escape(str) {
  return str.replace(/[\r\n\t]/g, match => {
    return {
      '\r': '\\r',
      '\n': '\\n',
      '\t': '\\t',
    }[match]
  })
}

console.log("First string:")
console.log(escape(str1))


console.log("Second string:")
console.log(escape(str2))

function operation(str) {
  return str.replace(/^/gm, "\t");
}

console.log("First string:")
str1 = operation(str1)
console.log(escape(str1))


console.log("Second string:")
str2 = operation(str2)
console.log(escape(str2))
.as-console-wrapper {
    max-height: 100% !important;
}

After the transformation, the first string is as expected \tFirst line\n\tnext Line, however the second one results into \tFirst line\r\t\n\tnext Line, which creates unexpected behaviour in a VScode Extension I was using. VSCode would interpret each \n and \r seperatly and replace each with \n\r, which resulted in undesired formatting. The console, doesn't seem to care, and displayed it as expected, hence the escape function, to show those nasty carriage returns and line feeds.

Is this expected behaviour or a bug in the javascript standard library?

CodePudding user response:

This is exactly according to the ECMAScript specification, 22.2.2.4 Runtime Semantics: CompileAssertion:

Assertion :: ^

  1. Return a new Matcher with parameters (x, c) that captures nothing and performs
     the following steps when called:
      a. Assert: x is a State.
      b. Assert: c is a Continuation.
      c. Let e be x's endIndex.
      d. If e = 0, or if Multiline is true and the character Input[e - 1] is one of
         LineTerminator, then
          i. Return c(x).
      e. Return failure.

Elsewhere, in Section 12.3 LineTerminator is defined as one of <LF> <CR> <LS> <PS>, so \r and \n are indeed seen as separate terminators.

Probably the easiest fix is not to use the m flag and match all types of line endings yourself (including \r which was used on Mac in the past):

>>> "foo\rbar\r\nbaz\nqux".replace(/(^|\r\n|\r|\n)/g, "$1\t")
"\tfoo\r\tbar\r\n\tbaz\n\tqux"

Note that \r\n must come before \r so that it gets matched as one unit.

  • Related