Home > Blockchain >  Using a where statement with rank and a subquery in SQL
Using a where statement with rank and a subquery in SQL

Time:12-16

so I have a table that's sort of like this:

DELIVERY_AREA_ID DELIVERY_RADIUS_METERS EVENT_STARTED_TIMESTAMP
234sfd 4000 2020-01-01 12:19:29.719
234sfd 6500 2020-01-01 12:31:40.325
234sfd 3500 2020-01-01 12:53:10.538
234sfd 6500 2020-01-01 13:11:36.094
234sfd 3500 2020-01-01 13:32:26.754
234sfd 6500 2020-01-01 13:59:11.104
234sfd 6500 2020-01-02 07:44:16.792
234sfd 3500 2020-01-02 08:07:36.284
234sfd 6500 2020-01-02 08:54:08.014
234sfd 3500 2020-01-02 09:53:05.853
234sfd 6500 2020-01-02 10:04:39.443
234sfd 10000 2020-07-01 08:29:20.194
234sfd 3500 2020-07-03 07:50:41.782
234sfd 10000 2020-07-03 08:33:14.695
234sfd 3500 2020-07-05 07:47:05.539
234sfd 10000 2020-07-05 07:53:13.930
234sfd 3500 2020-07-05 09:18:57.688
234sfd 10000 2020-07-05 09:51:07.547
234sfd 3500 2020-07-19 18:02:14.099

the data is actually much more varied but yeah it follows that format.

I am trying to, in one query, in snowflake database, make a get the top ranked radius by duration. I currently have this:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
       RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS

I want to get the first rank for each month_year but when I use

where RADIUS_RANK = 1

I get an error: Syntax error: unexpected 'where'. (line 21)

Im not sure how to resolve this

I have tried this link which appears to have the same question but the solution is already what I am trying.

CodePudding user response:

It is not possible to solve this scenario without querying the output of your query, in other words, using the output of that query as an input for another top-level query.

  • You can not use a field produced at the projection level in the WHERE clause
  • You can not use analytic functions in the WHERE clause
  • You can not use analytic functions in a HAVING clause

So the only solution is to query the output of that query and retrieve only the MIN rank.

CodePudding user response:

To filter windowed function at the same query level you need to use QUALIFY clause:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
       RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RADIUS_RANK = 1;

If the rank column is not required then the entire expression could be moved:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) = 1;
  • Related