Home > Back-end >  Regex that captures only some text between two words
Regex that captures only some text between two words

Time:01-03

I'm looking for a single regex written for nodejs that can capture only text in lines that start with PASS! or FAIL! and appears between two specific words. Example:

INFO! this line shouldn't be captured because it's before section121
[section120] section title1
Some noise
PASS! this line shouldn't captured either because it's before section121
[section121] section title2
more noise
FAIL! match1
a warning we wish to skip
more warnings
PASS! match2
FAIL! match3
[section122] section title3
noise
PASS! this shouldn't be captured because it appears after section122

The expected captures for this input are:

match1
match2
match3

Can this be achieved using a single regex? If not, an explanation why would also be accepted as an answer.

I tried writing several different regexes, but always ended up capturing only the last line (match3):

section121\][\s\S]*(?:PASS!|FAIL!)([\s\S]*)\[section122

CodePudding user response:

With JavaScript the support of a lookbehind assertion, you can use:

(?<=^\[section121].*(?:\n(?!\[section\d ]).*)*\n(?:PASS|FAIL)!).*

Explanation

  • (?<= Positive lookbehind
    • ^ Start of string
    • \[section121].* Match [section121] and the rest of the line
    • (?:\n(?!\[section\d ]).*)* Match a newline, and repeat matching all lines that do not start with [section 1 digits and ]
    • \n(?:PASS|FAIL)! Match a newline and either PASS! or FAIL!
  • ) Close the lookbehind
  • .* Match the rest of the line (optionally match any character except newlines)

See a regex101 demo

const regex = /(?<=^\[section121].*(?:\n(?!\[section\d ]).*)*\n(?:PASS|FAIL)!).*/gm;

const s = `INFO! this line shouldn't be captured because it's before section121
[section120] section title1
Some noise
PASS! this line shouldn't captured either because it's before section121
[section121] section title2
more noise
FAIL! match1
a warning we wish to skip
more warnings
PASS! match2
FAIL! match3
[section122] section title3
noise
PASS! this shouldn't be captured because it appears after section122`;

console.log(s.match(regex));

An alternative without the support for a lookbehind in 2 steps:

const regex = /\[section121].*(?:\n(?!\[section\d ]|(?:PASS|FAIL)!).*)*\n(?:PASS|FAIL)!.*(?:\n(?!\[section\d ]).*)*/;
const s = `INFO! this line shouldn't be captured because it's before section121
[section120] section title1
Some noise
PASS! this line shouldn't captured either because it's before section121
[section121] section title2
more noise
FAIL! match1
a warning we wish to skip
more warnings
PASS! match2
FAIL! match3
[section122] section title3
noise
PASS! this shouldn't be captured because it appears after section122`;

const res = s.match(regex);
if (res) {
  console.log(Array.from(res[0].matchAll(/^(?:PASS|FAIL)!(.*)/mg), m => m[1]))
}

  • Related