Using Snowflake
I'm trying to extract the street type from a column in a table based off a street type table where the only column is STREET_TYPES
containing the type and the abbreviated type
STREET_TYPES
------------
ABBAYE
ABE
ACH
AGGLOMERATION
AGL
AIR
AIRE
AIRES
ALL
ALLEE
ALLEES
ANCIEN CHEMIN
...
BD
...
BOULEVARD
...
My second table looks as follows:
STREET_LINE_1
-------------
AVENUE ANDRE MORIZET
AVENUE DUQUESNE
AV TREBOIS
RUE HENRI BARBUSSE
AVENUE MARX DORMOY
RUE ANDRE BONNENFANT
AVENUE DU GENERAL RAOUL SALAN
RESIDENCE DU PORT
GRAND BOULEVARD DES RESIDENCES DU PORT
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN DES REGENTS
CHEMIN COMMUNAL CASTEX
...
Neither tables have any columns in common. STREET_LINE_1
will always begin with a street type. I would need to extract up to the two (2) first words. As an example we might have I might have a type of GRAND BOULEVARD
or GRANDE RUE
or GRANDE AVENUE
. I will also need to pay attention to types such as CHEMIN
and CHEMIN COMMUNAL
(there are other instances)
Ultimately, I would like to have the data represented as follows:
STREET_TYPE | STREET_LINE_1
----------------|---------------
AVENUE | VENUE ANDRE MORIZET
AVENUE | DUQUESNE
AV | TREBOIS
RUE | HENRI BARBUSSE
AVENUE MARX | DORMOY
RUE | ANDRE BONNENFANT
AVENUE | DU GENERAL RAOUL SALAN
RESIDENCE | DU PORT
GRAND BOULEVARD | DES RESIDENCES DU PORT
BOULEVARD | PIERRE ET MARIE CURIE
CHEMIN | DES REGENTS
CHEMIN COMMUNAL | CASTEX
...
CodePudding user response:
So assume it is a straight start of string match STARTSWITH can be used to match the two tables, then the output can be trimmed with SUBSTRING and LENGTH and some handling of NULL for no matches. And then to rank the matches we can use QUALIFY and ROW_NUMBER with a crude longest street type wins
With street_types as (
SELECT * FROM VALUES
('ABBAYE'),
('ABE'),
('ACH'),
('AGGLOMERATION'),
('AGL'),
('AVENUE'),
('AIRE'),
('AIRES'),
('AV'),
('ALLEE'),
('ALLEES'),
('ANCIEN CHEMIN'),
('BD'),
('BOULEVARD'),
('CHEMIN'),
('CHEMIN COMMUNAL'),
('RESIDENCE'),
('GRAND BOULEVARD'),
('RUE')
), street_line_1 as (
SELECT * FROM VALUES
('AVENUE ANDRE MORIZET'),
('AVENUE DUQUESNE'),
('AV TREBOIS'),
('RUE HENRI BARBUSSE'),
('AVENUE MARX DORMOY'),
('RUE ANDRE BONNENFANT'),
('AVENUE DU GENERAL RAOUL SALAN'),
('RESIDENCE DU PORT'),
('GRAND BOULEVARD DES RESIDENCES DU PORT'),
('BOULEVARD PIERRE ET MARIE CURIE'),
('CHEMIN DES REGENTS'),
('CHEMIN COMMUNAL CASTEX')
)
SELECT
st.column1 as street_type,
substring(sl.column1, zeroifnull(length(st.column1)) 1) as street_line_1
FROM street_line_1 as sl
LEFT JOIN street_types as st
ON startswith(sl.column1,st.column1)
QUALIFY row_number()over(partition by sl.column1 order by length(st.column1) desc ) = 1
gives:
STREET_TYPE | STREET_LINE_1 |
---|---|
AV | TREBOIS |
AVENUE | ANDRE MORIZET |
AVENUE | DU GENERAL RAOUL SALAN |
AVENUE | DUQUESNE |
AVENUE | MARX DORMOY |
BOULEVARD | PIERRE ET MARIE CURIE |
CHEMIN COMMUNAL | CASTEX |
CHEMIN | DES REGENTS |
GRAND BOULEVARD | DES RESIDENCES DU PORT |
RESIDENCE | DU PORT |
RUE | ANDRE BONNENFANT |
RUE | HENRI BARBUSSE |