Home > database >  SQL, Trying to split Finnish addresses
SQL, Trying to split Finnish addresses

Time:05-20

I have address column which hosts Streetname housenumber( possible divider)( possible apartment no.) postcode City

5 different examples:

( Street   ), (Postal) (City)  
"Testalley 3, 00200 Helsinki"  
"Testalley 3 A 21, 00200 Helsinki  
"TestAlley 3 B, 00300 Helsinki  
"TestAlley 3, 00500 Helsinki AS  
"testAlley 3 F 22, 00500 Helsinki AS  

So, the variation of addresses change quite a bit.

I'll hope to get this big junk of address into 3 separate columns.

SELECT
bigAddress,

  
SUBSTRING(bigAddress,LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress) 2)),LEN(bigAddress) - LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress))) - LEN(RIGHT(bigAddress,CHARINDEX(' ', (REVERSE(bigAddress)))))) AS Postcode

FROM  TABLEXX    

^^This works, almost for the postcode.

Only problem is that, if the city is not one part like "HELSINKI" then the city comes along the postcode. Like 00300 Ylistaro (When city is Ylistaro AS)

with cte as (
SELECT
ID,
bigAddress,

  
SUBSTRING(bigAddress,LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress) 2)),LEN(bigAddress) - LEN(LEFT(bigAddress,CHARINDEX(',', bigAddress))) - LEN(RIGHT(bigAddress,CHARINDEX(' ', (REVERSE(bigAddress)))))) AS Postcode,
RIGHT(bigAddress,CHARINDEX(',',  (REVERSE(bigAddress))) - 1) AS City
FROM  TableXXX



select 


bigAddress,

LEFT(Postcode,5) As PostcodeV2,
STUFF(City, 1, 7, '') AS CityV2
FROM cte

^^ Also this was quite great, it did failed when tried to put this into PowerBi DirectQuery. PowerBI wont support it at DQ mode, and import mode did have some other problems.

CodePudding user response:

What you are trying to do is very risky since it's a well known problem that there is no really proper and safe way to separate street, postal and city from such an entire string. So please note that the following is just an idea to help you, but in future, you should directly save the information in different columns. Anyway, the following solution will work only with some assumptions. As example, there always must be a comma between the street and the rest. The postal must not contain any not numeric characters and the city must not contain any numeric characters. The idea is to first add four columns to your table:

ALTER TABLE yourtable ADD street varchar(200);
ALTER TABLE yourtable ADD postal varchar(200);
ALTER TABLE yourtable ADD city varchar(200);
ALTER TABLE yourtable ADD prov varchar(600);

The first three columns should be the columns you will in future use to save the information. The prov column will just be used during the data "transformation" and then be removed again. As first step, you will update the street column with everything before the comma and the prov column with the rest:

UPDATE yourtable SET street = SUBSTRING(bigAddress, 0, charindex(',', bigAddress, 0)),
prov = REPLACE(SUBSTRING(bigAddress,CHARINDEX(',',bigAddress)   1, LEN(bigAddress)),' ','');

Then you will fill the city column with the entire string which is currently saved in the prov column beginning with the first non numeric character. In other words, you will remove the postal from the city:

UPDATE yourtable SET 
city = RIGHT(prov,LEN(prov) - (PATINDEX('%[^0-9]%',prov) -1));

After this, you will remove the city from the prov column to get the postal and save it in the postal column:

UPDATE yourtable SET postal = REPLACE(prov, city,'');

The three columns are now filled correctly (as I said, as long as the required conditions are met), so you can remove the prov column again:

ALTER TABLE yourtable DROP COLUMN prov;

I created an example which shows this is working correctly: db<>fiddle

In future, please don't do such things, but use separate columns.

CodePudding user response:

Considering the postal codes as fixed length of 5 digits, you can make use of CHARINDEX, SUBSTRING, LEFT and RIGHT with some constants to get the data:

CREATE TABLE addresses (
  address VARCHAR(50) NOT NULL
);
 
INSERT INTO addresses (address)
VALUES
  ('Testalley 3, 00200 Helsinki'),
  ('Testalley 3 A 21, 00200 Helsinki'),
  ('TestAlley 3 B, 00300 Helsinki'),
  ('TestAlley 3, 00500 Helsinki AS'), 
  ('testAlley 3 F 22, 00500 Helsinki AS');

SELECT
  LEFT(address, CHARINDEX(',', address) - 1) AS street,
  SUBSTRING(address, CHARINDEX(',', address)   2, 5) AS postcode,
  RIGHT(address, LEN(address) - CHARINDEX(',', address) - 7) AS city
FROM addresses;

Results in:

street postcode city
Testalley 3 00200 Helsinki
Testalley 3 A 21 00200 Helsinki
TestAlley 3 B 00300 Helsinki
TestAlley 3 00500 Helsinki AS
testAlley 3 F 22 00500 Helsinki AS

You can play with the running demo at https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=a9e1469c753158d8e3cd0a4ab08f97ec

  • Related