Home > Software engineering >  Need SQL Snowflake Query - Left Join and Filtering based on secondary ID
Need SQL Snowflake Query - Left Join and Filtering based on secondary ID

Time:04-02

URL TABLE:

| URL PATH |
| www.abc.com/123  |
| www.abc.com/234  |
| www.abc.com/435  |

TRAFFIC TABLE:

| ID | URL PATH | COUNTRY | TIMESTAMP |
| 1 | www.abc.com/123 | US | 3/31/2022 10:00 |
| 1 | www.abc.com/gb/123 | GB | 3/31/2022 10:00 |
| 1 | www.abc.com/de/123 | DE | 3/31/2022 10:00 |
| 4 | www.abc.om/xyz | US | 3/31/2022 10:00 |
| 4 | www.abc.om/asd | DE | 3/31/2022 10:00 |

OBJECTIVE:

Hello, I have two tables. 1) URL Table contains a list of URLs specific to US Country and 2) Traffic Table containing all event timestamps for all countries on the website.

I want to first get the product ID that matches the URL PATH in the URL TABLE then get the aggregate count of that product ID. With the above example, the desired outcome would be:

| ID | COUNT |
| 1  | 3 |

ID 4 wouldn't appear because none of it's corresponding URL appears in the URL table.

www.abc.com/123 matches ID 1 in the traffic table and there are three rows associated with this ID hence a value of 3.

I've tried doing some LEFT JOINs but haven't had much luck.New to SQL. Would appreciate any help! Also using the Snowflake UI to create this query.

CodePudding user response:

SELECT ID, COUNT(ID) as COUNT FROM TRAFFICTABLE WHERE URLPATH in (SELECT URLPATH FROM URLTABLE) GROUP BY ID

can you try this?

CodePudding user response:

I assume that the URLs are matched based on the first and last part of the URLs?

Setup sample tables

-- Create sample URL_TABLE table
create or replace table url_table as
select COLUMN1::varchar as url_path
from
values ('www.abc.com/123'),
       ('www.abc.com/234'),
       ('www.abc.com/435')
;

-- Create sample TRAFFIC_TABLE table
create or replace table traffic_table as
select COLUMN1::number                                    as id,
       COLUMN2::varchar                                   as path,
       COLUMN3::varchar                                   as country,
       to_timestamp(COLUMN4::varchar, 'MM/DD/YYYY HH:MI') as timestamp

from
values ('1', 'www.abc.com/123', 'US', '3/31/2022 10:00'),
       ('1', 'www.abc.com/gb/123', 'GB', '3/31/2022 10:00'),
       ('1', 'www.abc.com/de/123', 'DE', '3/31/2022 10:00'),
       ('4', 'www.abc.om/xyz', 'US', '3/31/2022 10:00'),
       ('4', 'www.abc.om/asd', 'DE', '3/31/2022 10:00')
;

SQL Query

-- Run select:
with _traffic_table as (
    select id,
           split(path, '/') as                       arr,
           arr[0] || '/' || arr[array_size(arr) - 1] new_path
    from traffic_table
)

select _traffic_table.id,
       count(1)
from url_table
         join _traffic_table on url_table.url_path = _traffic_table.new_path
group by _traffic_table.id

Result:

 -- -------- 
|ID|COUNT(1)|
 -- -------- 
|1 |3       |
 -- -------- 
  • Related