Home > Net >  Regex: Match all newlines except first ones
Regex: Match all newlines except first ones

Time:12-29

Let's say I have this block of text:

* hello

  
world



hello again

Using RegEx in Javascript, how do I replace all - but except the first one - new lines between paragraphs with \ in all platforms?

So effectively the result would be,

* hello

\
world

\
\
hello again

CodePudding user response:

You could capture a non-newline (the dot normally) and an optional newline before and use a callback to check if first group is set. If so, return full match, else prepend backslash to newline.

const s = `* hello


world



hello again`;

const res = s.replace(/(.\n?)?\n/g, (m0, m1) => m1 ? m0 : '\\\n');

console.log(res);

  • to prevent a possible match at the beginning, use a lookahead: (.\n?)?(?!^)\n
  • with further covering CRLF linebreaks and take lines with horizontal space into account:
    s.replace(/(\S\r?\n?)?[\t ]*(?!^)(\r?\n)/g, (m0, m1, m2) => m1 ? m0 : '\\' m2);

CodePudding user response:

Try using .split() to keep everything that shouldn't be converted to a \, then use .map() to convert all the empty space into \s, then recombine the lines:

s = `* hello


world



hello again`;

s = s.split(/.*?(?<=\n)\n/g).map(x => x === '' ? '\\' : x).join('\n');
console.log(s)

I benchmarked this answer against bobble bubble's answer in JSBench and found mine to be slightly faster.

CodePudding user response:

You could use a look-behind assertion to account for the preceding empty line which should remain untouched. This needs no callback, and makes no assumption about the kind of newline character (\r or \n or combination) -- it just relies on the ^ and $ line-delimiting anchors:

const s = `* hello

  
world



hello again
`;

const res = s.replace(/(?<=^ *$\s ?^) *$/gm, "\\");
console.log(res);

See execution time comparison on JSBench:

  • Related