Home > Net >  Extract value of a column based off column from another table no matching columns
Extract value of a column based off column from another table no matching columns

Time:04-09

Using Snowflake I'm trying to extract the street type from a column in a table based off a street type table where the only column is STREET_TYPES containing the type and the abbreviated type

STREET_TYPES
------------
ABBAYE
ABE
ACH
AGGLOMERATION
AGL
AIR
AIRE
AIRES
ALL
ALLEE
ALLEES
ANCIEN CHEMIN
...
BD
...
BOULEVARD
...

My second table looks as follows:

STREET_LINE_1
-------------
AVENUE ANDRE MORIZET
AVENUE DUQUESNE
AV TREBOIS
RUE HENRI BARBUSSE
AVENUE MARX DORMOY
RUE ANDRE BONNENFANT
AVENUE DU GENERAL RAOUL SALAN
RESIDENCE DU PORT
GRAND BOULEVARD DES RESIDENCES DU PORT
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN DES REGENTS
CHEMIN COMMUNAL CASTEX 

...

Neither tables have any columns in common. STREET_LINE_1 will always begin with a street type. I would need to extract up to the two (2) first words. As an example we might have I might have a type of GRAND BOULEVARD or GRANDE RUE or GRANDE AVENUE. I will also need to pay attention to types such as CHEMIN and CHEMIN COMMUNAL (there are other instances)

Ultimately, I would like to have the data represented as follows:

STREET_TYPE     | STREET_LINE_1
----------------|---------------
AVENUE          | VENUE ANDRE MORIZET
AVENUE          | DUQUESNE
AV              | TREBOIS
RUE             | HENRI BARBUSSE
AVENUE MARX     | DORMOY
RUE             | ANDRE BONNENFANT
AVENUE          | DU GENERAL RAOUL SALAN
RESIDENCE       | DU PORT
GRAND BOULEVARD | DES RESIDENCES DU PORT
BOULEVARD       | PIERRE ET MARIE CURIE
CHEMIN          | DES REGENTS
CHEMIN COMMUNAL | CASTEX 
...

CodePudding user response:

So assume it is a straight start of string match STARTSWITH can be used to match the two tables, then the output can be trimmed with SUBSTRING and LENGTH and some handling of NULL for no matches. And then to rank the matches we can use QUALIFY and ROW_NUMBER with a crude longest street type wins

With street_types as (
    SELECT * FROM VALUES
        ('ABBAYE'),
        ('ABE'),
        ('ACH'),
        ('AGGLOMERATION'),
        ('AGL'),
    ('AVENUE'),
        ('AIRE'),
        ('AIRES'),
    ('AV'),
        ('ALLEE'),
        ('ALLEES'),
        ('ANCIEN CHEMIN'),
        ('BD'),
        ('BOULEVARD'),
    ('CHEMIN'),
    ('CHEMIN COMMUNAL'),
    ('RESIDENCE'),
    ('GRAND BOULEVARD'),
    ('RUE')
    
), street_line_1 as (
     SELECT * FROM VALUES    
        ('AVENUE ANDRE MORIZET'),
        ('AVENUE DUQUESNE'),
        ('AV TREBOIS'),
        ('RUE HENRI BARBUSSE'),
        ('AVENUE MARX DORMOY'),
        ('RUE ANDRE BONNENFANT'),
        ('AVENUE DU GENERAL RAOUL SALAN'),
        ('RESIDENCE DU PORT'),
        ('GRAND BOULEVARD DES RESIDENCES DU PORT'),
        ('BOULEVARD PIERRE ET MARIE CURIE'),
        ('CHEMIN DES REGENTS'),
        ('CHEMIN COMMUNAL CASTEX')
)
SELECT 
    st.column1 as street_type, 
    substring(sl.column1, zeroifnull(length(st.column1)) 1)  as street_line_1
FROM street_line_1 as sl
LEFT JOIN street_types as st
    ON startswith(sl.column1,st.column1)
QUALIFY row_number()over(partition by sl.column1 order by length(st.column1) desc ) = 1

gives:

STREET_TYPE STREET_LINE_1
AV TREBOIS
AVENUE ANDRE MORIZET
AVENUE DU GENERAL RAOUL SALAN
AVENUE DUQUESNE
AVENUE MARX DORMOY
BOULEVARD PIERRE ET MARIE CURIE
CHEMIN COMMUNAL CASTEX
CHEMIN DES REGENTS
GRAND BOULEVARD DES RESIDENCES DU PORT
RESIDENCE DU PORT
RUE ANDRE BONNENFANT
RUE HENRI BARBUSSE
  • Related