Grouping single line addresses with multiple word cities-CodePudding

EDIT 2: Solved. I ended up just requiring a comma dilineation between the street and city name. It's good enough for my needs. The final regex I went with is:

^(\d ) (\S . ),[ ]?(. ),[ ]?([A-Za-z]{2})[ ]?(\d{5})$ for those curious.

Howdie, I'm trying to parse addresses entered on a single line in the following format:

1234 Street Name Unit #225 Harpers Ferry, VA 12345

5547 Street Name City Name, WY 12345

9958 Street Name Apt 25 New York, NY 12345

EDIT: Changed the second example to be more representative of the data sets I'm working with and added a third example of a possible input.

and I'm having trouble dealing with the possibility of two word cities as seen in these examples. My naive implementation is

(?<Building>\b\d )\s(?<Street>. )(?<City>\b. ),\s(?<State>.{2})\s(?<Zip>\d{5}\b)

which appears to work provided the city only contains one word. However using the first example it would return the following results:

Building: 1234

Street: Street Name Unit #225 Harpers

City: Ferry

State: VA

Zip: 12345

Is there any way to cover this case without an additional delimiter at the end of the street and/or unit name?

CodePudding user response：

What about this?

(?<Building>\b\d )\s(?<Street>.*(?:#\d |No Unit Number))\s(?<City>\b. ),\s(?<State>.{2})\s(?<Zip>\d{5}\b)

See demo