I am trying to isolate street address fields that begin with a digit, contain an underscore and end with a comma:
001 ALLAN Witham Ross 13 Every_Street, Welltown Greenkeeper 002 ALLARDYCE Margaret Isabel 49 Bell_Road, Musicville Housewife 003 ALLARDYCE Mervyn George 49 Bell_Road, Musicville Company Mngr
e.g
13 Every_Street, Welltown
49 Bell_Road, Musicville
49 Bell_Road, Musicville
My regex is
(?ms)([0-9] \s[A-Z][a-z]. (?=,))
But this matches 13 through to the last 'd' of Bell_Road. Which is almost everything. See regex101 example
This matches two commas but not the third? I want it to match up to the next comma. But do it three times :)
CodePudding user response:
You don't have to assert the comma to the right if you also want to match it.
If you want to match an underscore before the comma, and the address part itself can not contain a comma:
\b\d \s [A-Z][a-z][^_,]*_[^,] ,\s \S
Explanation
\b
A word boundary\d
Match 1 digits\s
Match 1 whitespace chars[A-Z][a-z]
match an uppercase char A-Z and a lowercase char a-z[^_,]*_
Optionally match any char except_
or,
and then match_
[^,]*,
Match optional chars except,
and then match,
\s \S
Match 1 whitespace chars followed by 1 non whitespace chars
CodePudding user response:
This produces your desired matches:
\d [^,\d]*_[^,] , \S
demo
They don't end with a comma, tho.
For that you could just remove \S
at the end.