Home > Mobile >  Parse Address using Regex Capture Groups
Parse Address using Regex Capture Groups

Time:06-09

I am trying to parsing the addresses into groups and I have this regular expression:

(^.*?(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),(?:)? ?(.*?),? ?([A-Z]{2,3}),? ?(\d{,4})$

which is capturing and group these addresses:

139 McKinnon Road, PINELANDS, NT, 829
108 East Point Road, Fannie Bay, NT, 820
3-11 Hamilton Street, Townsville City, QLD, 4810
40 17 Geranium Street, THE GARDENS, NT, 820
Lot 9 Island Point Road, ST GEORGES BASIN, NSW, 2540
316 Sturt Street and 511 Flinders Street, Townsville City, QLD, 4810

but not capturing addresses with these format:

1, 3, 5 Demeter Street & 12 Hermes Avenue ROUSE HILL NSW 2155
31 Stephen Street SOUTH TOOWOOMBA QLD 4350

I would like to have these addresses into separate groups like:

street_address = 1, 3, 5 Demeter Street & 12 Hermes Avenue
subrub = ROUSE HILL
state = QLD
postcode = 4350

How to capture both the addresses using the above expression? Here is my Regex code

CodePudding user response:

You can use specific regex to match each of your four groups separately using the following ones:

  • Group 1, containing the address, called <street_address>:
.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)
  • Group 2, containing the subrub, called <subrub>:
[A-Za-z ] 
  • Group 3, containing the state, called <state>:
[A-Z] 
  • Group 4, containing the postcode, called :
\d 

Your final regex is none other than the concatenation of these regex using an optional comma and a mandatory space ,? .

(?P<street_address>.*(?:Lane|Street|Boulevard|Crescent|Place|Road|Highway|Avenue|Drive|Circuit|Parade|Telopea|Nicklin Way|Terrace|Square|Court|Close|Endeavour Way|Esplanade|East|The Centreway|Mall|Quay|Gateway|Low Way|Point|Rd|Morinda|Way|Ave|St|South Steyne|Broadway|HQ|Expressway|Street|Castlereagh|Meadow Way|Track|Kulkyne Way|Narabang Way|Bank)),? (?P<subrub>[A-Za-z ] ),? (?P<state>[A-Z] ),? (?P<postcode>\d )

Check the demo here.

Note: In your Python code, you'll be able to extract each group by its corresponding name.

  • Related