I am trying to parsing the addresses into groups and I have this regular expression:
(^.*?(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),(?:)? ?(.*?),? ?([A-Z]{2,3}),? ?(\d{,4})$
which is capturing and group these addresses:
139 McKinnon Road, PINELANDS, NT, 829
108 East Point Road, Fannie Bay, NT, 820
3-11 Hamilton Street, Townsville City, QLD, 4810
40 17 Geranium Street, THE GARDENS, NT, 820
Lot 9 Island Point Road, ST GEORGES BASIN, NSW, 2540
316 Sturt Street and 511 Flinders Street, Townsville City, QLD, 4810
but not capturing addresses with these format:
1, 3, 5 Demeter Street & 12 Hermes Avenue ROUSE HILL NSW 2155
31 Stephen Street SOUTH TOOWOOMBA QLD 4350
I would like to have these addresses into separate groups like:
street_address = 1, 3, 5 Demeter Street & 12 Hermes Avenue
subrub = ROUSE HILL
state = QLD
postcode = 4350
How to capture both the addresses using the above expression? Here is my Regex code
CodePudding user response:
You can use specific regex to match each of your four groups separately using the following ones:
- Group 1, containing the address, called
<street_address>
:
.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)
- Group 2, containing the subrub, called
<subrub>
:
[A-Za-z ]
- Group 3, containing the state, called
<state>
:
[A-Z]
- Group 4, containing the postcode, called :
\d
Your final regex is none other than the concatenation of these regex using an optional comma and a mandatory space ,?
.
(?P<street_address>.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),? (?P<subrub>[A-Za-z ] ),? (?P<state>[A-Z] ),? (?P<postcode>\d )
Check the demo here.
Note: In your Python code, you'll be able to extract each group by its corresponding name.