We need to convert the below query to snowflake but, getting the below error while executing:
TD Query:
SEL * FROM EMP_HIST N
QUALIFY MAX(CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0)))
OVER (PARTITION BY N.emp_id
ORDER BY N.pdate, N.ptime) = CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0))
OR MAX(CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0)))
OVER (PARTITION BY N.leg_id
ORDER BY N.pdate, N.ptime) IS NULL;
Converted Snowflake Query:
SELECT * FROM EMP_HIST AS N
WHERE
MAX(CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)))
OVER (PARTITION BY N.emp_id
ORDER BY N.pdate , N.ptime ) = CAST (CONCAT((to_date( N.pdate)) , ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))
OR MAX(CAST (CONCAT((to_date( N.pdate)), ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)) )
OVER ( PARTITION BY N.leg_id
ORDER BY N.pdate, N.ptime ) IS NULL;
Error:
SQL compilation error: Window function [MAX(CAST(CONCAT(CAST(N.PDATE AS VARCHAR), ' ', CAST(N.PTIME AS VARCHAR(10))) AS TIMESTAMP_NTZ(9))) OVER (PARTITION BY N.EMP_ID ORDER BY N.PDATE ASC NULLS LAST, N.PTIME ASC NULLS LAST)] appears outside of SELECT, QUALIFY, and ORDER BY clauses.
CodePudding user response:
You should be able to use qualify
in snowflake too. Try changing it to:
SELECT * FROM EMP_HIST AS N
QUALIFY
MAX(CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))) OVER (PARTITION BY N.emp_id ORDER BY N.pdate , N.ptime ) = CAST (CONCAT((to_date( N.pdate)) , ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))
OR
MAX(CAST (CONCAT((to_date( N.pdate)), ' ' , CAST (N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)) ) OVER ( PARTITION BY N.leg_id ORDER BY N.pdate, N.ptime ) IS NULL;
Qualify documentation from Snowflake is here
CodePudding user response:
This question is interesting because that QUALIFY clause is really ugly, and I am sure it can be written cleaner, but to do so, what it's doing needs tidying up.
The first part:
CAST((N.pdate (DATE, FORMAT 'YYYY-MM-DD')) || ' ' || CAST(N.ptime AS CHAR(10)) AS TIMESTAMP(0))
is turning a date to a string and concatinating it with space and time also turned to string.
Where-as your Snowflake SQL is:
CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9))
Which is getting a date, parsing it as a date, implicitly casting to a string, concatenating it with space and a time that is explicitly cast to a string and them cast the string to a timestamp.
So here is a series of way the concatination can be done much better, but the real kicker is, there is already a function to make a timestamp out of date/time parts:
SELECT *
,CAST(CONCAT((to_date( N.pdate )), ' ' , CAST(N.ptime AS CHAR(10))) AS TIMESTAMP_NTZ(9)) as timestamp_a
,TO_TIMESTAMP( CONCAT(to_char(N.pdate, 'YYYY-MM-DD' ), ' ' , to_char(N.ptime))) as timestamp_b
,TO_TIMESTAMP( to_char(N.pdate, 'YYYY-MM-DD') || ' ' || to_char(N.ptime)) as timestamp_c
,TO_TIMESTAMP( N.pdate || ' ' || N.ptime) as timestamp_d
,TIMESTAMP_FROM_PARTS( N.pdate, N.ptime ) as timestamp_e
FROM VALUES
('2022-02-01'::date, '13:45:56'::time),
('2022-01-02'::date, '05:21:00'::time)
N(pdate, ptime)
gives:
PDATE | PTIME | TIMESTAMP_A | TIMESTAMP_B | TIMESTAMP_C | TIMESTAMP_D | TIMESTAMP_E |
---|---|---|---|---|---|---|
2022-02-01 | 13:45:56 | 2022-02-01 13:45:56.000 | 2022-02-01 13:45:56.000 | 2022-02-01 13:45:56.000 | 2022-02-01 13:45:56.000 | 2022-02-01 13:45:56.000 |
2022-01-02 | 05:21:00 | 2022-01-02 05:21:00.000 | 2022-01-02 05:21:00.000 | 2022-01-02 05:21:00.000 | 2022-01-02 05:21:00.000 | 2022-01-02 05:21:00.000 |
So substituting that in, we then can see the right hind side is just "this rows" timestamp, thus it could be:
SELECT * FROM EMP_HIST AS N
QUALIFY
MAX(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER (PARTITION BY N.emp_id ORDER BY N.pdate, N.ptime ) = TIMESTAMP_FROM_PARTS( N.pdate, N.ptime ))
OR
MAX(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER ( PARTITION BY N.leg_id ORDER BY N.pdate, N.ptime ) IS NULL;
Which is really saying when the lastrow for this emp_id
is this row
OR the last row for leg_id is null
The first of those can be turned into a ROW_NUMBER with descending sort on the data/time
ROW_NUMBER() OVER (PARTITION BY N.emp_id ORDER BY N.pdate DESC, N.ptime DESC) = 1
and a MAX(value) OVER (stuff) is only null when all values are null, as MAX skips nulls if it can. This part off the top of my head I cannot think of an simplification to. COUNT(x) OVER(y) = 0
would be much smaller.
but it could be done still
COUNT(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER ( PARTITION BY N.leg_id) = 0
Thus your SQL could be a much more readable:
SELECT N.*
FROM EMP_HIST AS N
QUALIFY
ROW_NUMBER() OVER (PARTITION BY N.emp_id ORDER BY N.pdate DESC, N.ptime DESC) = 1
OR
COUNT(TIMESTAMP_FROM_PARTS( N.pdate, N.ptime )) OVER ( PARTITION BY N.leg_id) = 0