EDIT 2: Solved. I ended up just requiring a comma dilineation between the street and city name. It's good enough for my needs. The final regex I went with is:
^(\d ) (\S . ),[ ]?(. ),[ ]?([A-Za-z]{2})[ ]?(\d{5})$
for those curious.
Howdie, I'm trying to parse addresses entered on a single line in the following format:
1234 Street Name Unit #225 Harpers Ferry, VA 12345
5547 Street Name City Name, WY 12345
9958 Street Name Apt 25 New York, NY 12345
EDIT: Changed the second example to be more representative of the data sets I'm working with and added a third example of a possible input.
and I'm having trouble dealing with the possibility of two word cities as seen in these examples. My naive implementation is
(?<Building>\b\d )\s(?<Street>. )(?<City>\b. ),\s(?<State>.{2})\s(?<Zip>\d{5}\b)
which appears to work provided the city only contains one word. However using the first example it would return the following results:
Building: 1234
Street: Street Name Unit #225 Harpers
City: Ferry
State: VA
Zip: 12345
Is there any way to cover this case without an additional delimiter at the end of the street and/or unit name?
CodePudding user response:
What about this?
(?<Building>\b\d )\s(?<Street>.*(?:#\d |No Unit Number))\s(?<City>\b. ),\s(?<State>.{2})\s(?<Zip>\d{5}\b)
See demo