How could I update this regex so it only matches (providing the Group "CustomerName"s) from the starting point marked by "START"? (using Global & Multiline setting)
Current regex:
(Mr)\s*(?<CustomerName>.*)\b
Text to Scan
qwer asdf asdf
Mr Bill Smith
qwer asdf asdf
Mr Bob Smith
START
qwer asdf asdf
Mr Correct1 Smith
qwer asdf asdf
Mr Correct2 Smith
asldfj asldf
So instead of matching four names, it should only match the last two names =>
- Group CustomerName: Correct1 Smith
- Group CustomerName: Correct2 Smith
CodePudding user response:
The following example uses a lookbehind with a quantifier which surprisingly ECMAScript can do yet PCRE and Python cannot.
/(?<=START[\s\S] )\n (Mr. . \b.*\n)/g
Segment | Description |
---|---|
(?<=START[\s\S] ) |
Whatever follows literal "START", then one or more whitespace or non-whitespace can be a match |
\n␣␣ |
A newline, then two spaces¹ |
(Mr. ␣. |
Capture group $1: literal "Mr", then one or more of any char, then a space, then one or more of any char... |
\b.*\n) |
...then a word char on the left and a non-word char on the right, then zero or more of any char, and finally a new line |
1A space character is represented by a ␣
const str = `
qwer asdf asdf
Mr Bill Smith
qwer asdf asdf
Mr Bob Smith
START
qwer asdf asdf
Mr Correct1 Smith
qwer asdf asdf
Mr Correct2 Smith
asldfj asldf
`;
const rgx = /(?<=START[\s\S] )\n (Mr. . \b.*\n)/g;
const res = str.match(rgx);
console.log(res.map(m => m.trim()));
CodePudding user response:
Using a lookbehind assertion:
(?<=^\s*START\s*$[\s\S]*\n\s*)Mr\s (?<CustomerName>.*\S)
Or using the \G
anchor
(?:^\s*START\s*|\G(?!^))(?:\n(?!\s*Mr).*)*\n\s*Mr\s (?<CustomerName>.*\S).*