Home > Software design >  how to get regex matching occurring from the given string? (example attached)
how to get regex matching occurring from the given string? (example attached)

Time:10-14

How could I update this regex so it only matches (providing the Group "CustomerName"s) from the starting point marked by "START"? (using Global & Multiline setting)

Current regex:

(Mr)\s*(?<CustomerName>.*)\b

Text to Scan

qwer asdf asdf
Mr Bill Smith
qwer asdf asdf
  Mr Bob Smith
  START  
qwer asdf asdf
  Mr Correct1 Smith  
qwer asdf asdf
  Mr Correct2 Smith  
asldfj asldf 

So instead of matching four names, it should only match the last two names =>

  • Group CustomerName: Correct1 Smith
  • Group CustomerName: Correct2 Smith

CodePudding user response:

The following example uses a lookbehind with a quantifier which surprisingly ECMAScript can do yet PCRE and Python cannot.

/(?<=START[\s\S] )\n  (Mr.  . \b.*\n)/g

Regex101

Segment Description
(?<=START[\s\S] )
Whatever follows literal "START", then one or more whitespace or non-whitespace can be a match
\n␣␣
A newline, then two spaces¹
(Mr. ␣. 
Capture group $1: literal "Mr", then one or more of any char, then a space, then one or more of any char...
\b.*\n)
...then a word char on the left and a non-word char on the right, then zero or more of any char, and finally a new line

1A space character is represented by a

const str = `
qwer asdf asdf
Mr Bill Smith
qwer asdf asdf
  Mr Bob Smith
  START  
qwer asdf asdf
  Mr Correct1 Smith  
qwer asdf asdf
  Mr Correct2 Smith  
asldfj asldf 
`;
const rgx = /(?<=START[\s\S] )\n  (Mr.  . \b.*\n)/g;

const res = str.match(rgx);

console.log(res.map(m => m.trim()));

CodePudding user response:

Using a lookbehind assertion:

(?<=^\s*START\s*$[\s\S]*\n\s*)Mr\s (?<CustomerName>.*\S)

Regex demo

Or using the \G anchor

(?:^\s*START\s*|\G(?!^))(?:\n(?!\s*Mr).*)*\n\s*Mr\s (?<CustomerName>.*\S).*

Regex demo

  • Related