There are some raw rows with two or more addresses, I want to split them based on the last part of the Canadian postal code using a look-arround mechanism. The Canadian postal code format is A1A 1A1, where A is a letter and 1 is a digit, with a space separating the third and fourth characters.
Here is an example
160 Rue, Notre Dame N, Bureau 140, Sainte-Marie, G6E 3Z9 887 Chemin du Bord de l'Eau, Saint-Henri de Levis, G0R 3E0
I want to split the address based on the space after the last part of postal code if it exists The result:
160 Rue, Notre Dame N, Bureau 140, Sainte-Marie, G6E 3Z9
887 Chemin du Bord de l'Eau, Saint-Henri de Levis, G0R 3E0
I tried
List<String> addresses = new ArrayList<String>();
addresses = Arrays.asList(long_addresses.Address.split("(\\d\\w\\d)\\s"));
But the result is:
[, Rue, Notre Dame N, Bureau 140, Sainte-Marie, G6E , , Chemin du Bord de l'Eau, Saint-Henri de Levis, G0R 3E0]
Here are some other exanmples
141 rang du brûlé, pont rouge, G3H1B8 200 rue Commerciale, Donnacona, G3M 1W1
33 rue provost, Montreal, H8S 1L3 46 avenue Sainte-Anne, Pointe-Claire, H9S 4P8 2035 rue Victoria, Lachine, H8S 0A8 2075 rue de l'Eglise, Saint-Laurent, H4M 1G3 800 Pl Leigh-Capreol, Dorval, Montréal, H4Y 0A4
2075 rue de l'Eglise, Saint-Laurent, H4M 1G3 2035 rue Victoria, Lachine, H8S 0A8 46 ave, Sainte-Anne, Pointe-Claire, H9S 4P8 12 Charlevoix , Kirkland, H9J 2T6 930 St Germain St, Ville St-Laurent, H4L 3R9 1417 argyle , Montreal, H3G 1V5
Note: I trim the last postal code that does not have a space. Thank you in advance.
CodePudding user response:
You can use
(?<=\b[a-zA-Z]\d[a-zA-Z]\s\d[a-zA-Z]\d)\s
Or, if the space between the A1A
and 1A1
is optional, and can go missing, you can use
(?<=\b[a-zA-Z]\d[a-zA-Z]\s{0,1}\d[a-zA-Z]\d)\s
This will still work since Java regex engine supports constrained width lookbehind patterns.
See the regex demo / regex demo #2. Details:
(?<=\b[a-zA-Z]\d[a-zA-Z]\s\d[a-zA-Z]\d)
- a positive lookbehind that requires (immediately to the left of the current location):\b
- a word boundary[a-zA-Z]
- a letter\d
- a digit[a-zA-Z]\s\d[a-zA-Z]\d
- a letter, a whitespace, digit, letter and a digit
\s
- one or more whitespaces.
The \s{0,1}
matches one or zero whitespaces.
See the Java demo online:
String s = "160 Rue, Notre Dame N, Bureau 140, Sainte-Marie, G6E 3Z9 887 Chemin du Bord de l'Eau, Saint-Henri de Levis, G0R 3E0";
String regex = "(?<=\\b[a-zA-Z]\\d[a-zA-Z]\\s\\d[a-zA-Z]\\d)\\s ";
// Or
// String regex = "(?<=\\b[a-zA-Z]\\d[a-zA-Z]\\s{0,1}\\d[a-zA-Z]\\d)\\s ";
String results[] = s.split(regex);
for (String str: results) {
System.out.println(str);
}
Output:
160 Rue, Notre Dame N, Bureau 140, Sainte-Marie, G6E 3Z9
887 Chemin du Bord de l'Eau, Saint-Henri de Levis, G0R 3E0