Home > other >  regex101: How can I capture the street address on each line in a more economical way? I am new to Re
regex101: How can I capture the street address on each line in a more economical way? I am new to Re

Time:04-07

I am trying to capture the street on each line of text on the regex101.com, but the address on line 2; "4241 Jerry Dove Drive" and line 6; "946 Douglas Dairy Road" are giving me problems because they are the only 3 word street names. There must be a better way of writing this!

expression:

\s(\d{2,4}\s[A-Z]\w*\s[A-Z]\w*\sD?r?i?v?e?R?o?a?d?)

This is the txt file:

Jazmine Holcomb 3212 Adams Avenue Washington MD [email protected]
Sofie Hagan 4241 Jerry Dove Drive Erie PA [email protected]
Cairo Tyson 3768 Clifford Street San_Jose CA [email protected]
Tasmin Kearney 2956 Adams Drive El_Campo CA [email protected]
Aydin Moran 3727 Sarah Drive Lake_Charles LA [email protected]
Samirah Pollard 946 Douglas Dairy Road Prosperity SC [email protected]
Jaskaran Wheeler 1521 Richards Avenue Torrance CA [email protected]
Gerrard Browning 4690 Felosa Drive Los_Angeles CA [email protected]
Haleema Craft 73 Pinchalone Street Norfolk VA [email protected]
Brett Neal 4079 Johnson Street Garner NC [email protected]

    

CodePudding user response:

This regex could suit you:

\s(\d{2,4}(?:\s[A-Z]\w*){1,3}\s(?:Street|Drive|Avenue|Road))

It contains a non-capturing group (?: ) for the words before the final word, and says have 1 to 3 of such words, and then as last word it needs one of

  • Street
  • Drive
  • Avenue
  • Road

You need to enclose that last word in a non-capturing group as well, otherwise, the first variant would be the 1 to 3 words "Street" and the other variants would be just "Drive", "Avenue" or "Road" without the words before.

If you like, you could increase the number of words before the final word by any number, or up to infinity by just having a (one or more) instead of the {1,3} (any number between 1 and 3)

CodePudding user response:

Try something like this:

\d  [\w ]*?(?= \w  \b[A-Z]{2}\b)

It says

  • Some numbers
  • Followed by a space
  • Followed by letters and spaces until
  • An ignored group starting with a space, a word, a space and exactly two capital letters

CodePudding user response:

Possible (quick) solution is the following:

import re

string = """Jazmine Holcomb 3212 Adams Avenue Washington MD [email protected]
Sofie Hagan 4241 Jerry Dove Drive Erie PA [email protected]
Cairo Tyson 3768 Clifford Street San_Jose CA [email protected]
Tasmin Kearney 2956 Adams Drive El_Campo CA [email protected]
Aydin Moran 3727 Sarah Drive Lake_Charles LA [email protected]
Samirah Pollard 946 Douglas Dairy Road Prosperity SC [email protected]
Jaskaran Wheeler 1521 Richards Avenue Torrance CA [email protected]
Gerrard Browning 4690 Felosa Drive Los_Angeles CA [email protected]
Haleema Craft 73 Pinchalone Street Norfolk VA [email protected]
Brett Neal 4079 Johnson Street Garner NC [email protected]"""


re_pattern = re.compile("(\d  [a-z ] (?:Avenue|Drive|Street|Road))", re.I)

found = re_pattern.findall(string)

print(found)

Prints

['3212 Adams Avenue',
 '4241 Jerry Dove Drive',
 '3768 Clifford Street',
 '2956 Adams Drive',
 '3727 Sarah Drive',
 '946 Douglas Dairy Road',
 '1521 Richards Avenue',
 '4690 Felosa Drive',
 '73 Pinchalone Street',
 '4079 Johnson Street']

Regex explanation:

enter image description here

  • Related