Home > Blockchain >  snowflake partition by clause issue on a sql statement
snowflake partition by clause issue on a sql statement

Time:03-17

We need to convert the below query to snowflake but, getting the below error while executing:

TD Query:

SEL * FROM EMP_HIST N
QUALIFY MAX(CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0)))
OVER (PARTITION BY N.emp_id
ORDER BY N.pdate, N.ptime) = CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0))
OR MAX(CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0))) 
OVER (PARTITION BY N.leg_id
ORDER BY N.pdate, N.ptime) IS NULL;

Converted Snowflake Query:

SELECT * FROM EMP_HIST AS N
WHERE
MAX(CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))) 
OVER (PARTITION BY N.emp_id
ORDER BY N.pdate , N.ptime ) = CAST (CONCAT((to_date( N.pdate)) , ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))
OR MAX(CAST (CONCAT((to_date( N.pdate)), ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)) ) 
OVER ( PARTITION BY N.leg_id
ORDER BY N.pdate, N.ptime ) IS NULL;

Error:

SQL compilation error: Window function [MAX(CAST(CONCAT(CAST(N.PDATE AS VARCHAR), ' ', CAST(N.PTIME AS VARCHAR(10))) AS TIMESTAMP_NTZ(9))) OVER (PARTITION BY N.EMP_ID ORDER BY N.PDATE ASC NULLS LAST, N.PTIME ASC NULLS LAST)] appears outside of SELECT, QUALIFY, and ORDER BY clauses.

CodePudding user response:

You should be able to use qualify in snowflake too. Try changing it to:

SELECT * FROM EMP_HIST AS N
QUALIFY
MAX(CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))) OVER (PARTITION BY N.emp_id ORDER BY N.pdate , N.ptime ) = CAST (CONCAT((to_date( N.pdate)) , ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))
OR 
MAX(CAST (CONCAT((to_date( N.pdate)), ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)) ) OVER ( PARTITION BY N.leg_id ORDER BY N.pdate, N.ptime ) IS NULL;

Qualify documentation from Snowflake is here

CodePudding user response:

This question is interesting because that QUALIFY clause is really ugly, and I am sure it can be written cleaner, but to do so, what it's doing needs tidying up.

The first part:

CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0))

is turning a date to a string and concatinating it with space and time also turned to string.

Where-as your Snowflake SQL is:

CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))

Which is getting a date, parsing it as a date, implicitly casting to a string, concatenating it with space and a time that is explicitly cast to a string and them cast the string to a timestamp.

So here is a series of way the concatination can be done much better, but the real kicker is, there is already a function to make a timestamp out of date/time parts:

SELECT *
    ,CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)) as timestamp_a
    ,TO_TIMESTAMP( CONCAT(to_char(N.pdate, 'YYYY-MM-DD' ), ' ' , to_char(N.ptime))) as timestamp_b
    ,TO_TIMESTAMP( to_char(N.pdate, 'YYYY-MM-DD') || ' ' || to_char(N.ptime)) as timestamp_c
    ,TO_TIMESTAMP( N.pdate || ' ' || N.ptime) as timestamp_d
    ,TIMESTAMP_FROM_PARTS( N.pdate, N.ptime ) as timestamp_e
FROM VALUES
    ('2022-02-01'::date, '13:45:56'::time),
    ('2022-01-02'::date, '05:21:00'::time)
    N(pdate, ptime)

gives:

PDATE PTIME TIMESTAMP_A TIMESTAMP_B TIMESTAMP_C TIMESTAMP_D TIMESTAMP_E
2022-02-01 13:45:56 2022-02-01 13:45:56.000 2022-02-01 13:45:56.000 2022-02-01 13:45:56.000 2022-02-01 13:45:56.000 2022-02-01 13:45:56.000
2022-01-02 05:21:00 2022-01-02 05:21:00.000 2022-01-02 05:21:00.000 2022-01-02 05:21:00.000 2022-01-02 05:21:00.000 2022-01-02 05:21:00.000

So substituting that in, we then can see the right hind side is just "this rows" timestamp, thus it could be:

SELECT * FROM EMP_HIST AS N
QUALIFY
    MAX(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER (PARTITION BY N.emp_id ORDER BY N.pdate, N.ptime ) = TIMESTAMP_FROM_PARTS( N.pdate, N.ptime ))
OR 
    MAX(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER ( PARTITION BY N.leg_id ORDER BY N.pdate, N.ptime ) IS NULL;

Which is really saying when the lastrow for this emp_id is this row OR the last row for leg_id is null

The first of those can be turned into a ROW_NUMBER with descending sort on the data/time

ROW_NUMBER() OVER (PARTITION BY N.emp_id ORDER BY N.pdate DESC, N.ptime DESC) = 1

and a MAX(value) OVER (stuff) is only null when all values are null, as MAX skips nulls if it can. This part off the top of my head I cannot think of an simplification to. COUNT(x) OVER(y) = 0 would be much smaller.

but it could be done still

COUNT(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER ( PARTITION BY N.leg_id) = 0

Thus your SQL could be a much more readable:

SELECT N.* 
FROM EMP_HIST AS N
QUALIFY
    ROW_NUMBER() OVER (PARTITION BY N.emp_id ORDER BY N.pdate DESC, N.ptime DESC) = 1
OR 
    COUNT(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER ( PARTITION BY N.leg_id) = 0
  • Related