Home > Software design >  Understanding regex lookaround to get desired result
Understanding regex lookaround to get desired result

Time:12-06

I am trying to isolate street address fields that begin with a digit, contain an underscore and end with a comma:

001 ALLAN Witham Ross 13 Every_Street, Welltown Greenkeeper 002 ALLARDYCE Margaret Isabel 49 Bell_Road, Musicville Housewife 003 ALLARDYCE Mervyn George 49 Bell_Road, Musicville Company Mngr

e.g

13 Every_Street, Welltown
49 Bell_Road, Musicville
49 Bell_Road, Musicville

My regex is

(?ms)([0-9] \s[A-Z][a-z]. (?=,))

But this matches 13 through to the last 'd' of Bell_Road. Which is almost everything. See regex101 example

This matches two commas but not the third? I want it to match up to the next comma. But do it three times :)

CodePudding user response:

You don't have to assert the comma to the right if you also want to match it.

If you want to match an underscore before the comma, and the address part itself can not contain a comma:

\b\d \s [A-Z][a-z][^_,]*_[^,] ,\s \S 

Explanation

  • \b A word boundary
  • \d Match 1 digits
  • \s Match 1 whitespace chars
  • [A-Z][a-z] match an uppercase char A-Z and a lowercase char a-z
  • [^_,]*_ Optionally match any char except _ or , and then match _
  • [^,]*, Match optional chars except , and then match ,
  • \s \S Match 1 whitespace chars followed by 1 non whitespace chars

Regex demo

CodePudding user response:

This produces your desired matches:
\d [^,\d]*_[^,] , \S
demo

They don't end with a comma, tho.
For that you could just remove \S at the end.

  • Related