Home > Mobile >  Is it possible to implement conditions in the offset of the lag() function on Snowflake?
Is it possible to implement conditions in the offset of the lag() function on Snowflake?

Time:11-27

I have a table on Snowflake that contains messages between company admins and users. Here's what the table looks like.

id message destination messageable_id sent_at
1 Hello Customer! outgoing 1700103 2022-03-22 22:42:11.000
2 Hello Company! incoming 1700103 2022-03-22 22:39:56.000

I have been trying to get the response time by using lag(sent_at,1) over (partition by messageable_id order by sent_at) to get the sent_at value from the previous row, and calculating the datediff there as the response time.

However, I realized that there are records where I have 3 consecutive outgoing rows, and it would make more sense for me to get the earliest sent_at value in that group rather than the latest one.

I'm wondering if it would be possible to implement a condition on the offset in the lag() syntax. Something along the lines of IF 3 consecutive outgoing values in column, then offset = 3 else 1.

So far, I've looked into using window functions but no luck there.

CodePudding user response:

here is one way :

select *,lag(sent_at,1) over (partition by messageable_id order by sent_at) 
from (
   select * , row_number() over (partition by messageable_id , destination order by sent_at asc) rn
   from tablename
) t where rn = 1 

CodePudding user response:

So there are a couple of points, first, you messages by ID the id's are traveling back in time, and you have no three message example data, and not expected best case outcomes results:

so lets make some data that makes more sense:

with data(id, message, destination, messageable_id, sent_at) as (
    select * from values
        (1, 'msg with 2', 'incoming', 1700103, '2022-03-22 22:39:56.000'::timestamp_ntz),
        (2, 'msg with 2', 'outgoing', 1700103, '2022-03-22 22:42:11.000'::timestamp_ntz),
    
        (3, 'msg with 3', 'incoming', 1700104, '2022-03-22 22:39:56.000'::timestamp_ntz),
        (4, 'msg with 3', 'outgoing', 1700104, '2022-03-22 22:42:11.000'::timestamp_ntz),
        (5, 'msg with 3', 'outgoing', 1700104, '2022-03-22 22:43:11.000'::timestamp_ntz)
)

now using your "what I tried SQL"

select 
    *
    ,lag(sent_at,1) over (partition by messageable_id order by sent_at) as val_sent_at
from data
order by messageable_id, sent_at;
ID MESSAGE DESTINATION MESSAGEABLE_ID SENT_AT VAL_SENT_AT
1 msg with 2 incoming 1700103 2022-03-22 22:39:56.000 null
2 msg with 2 outgoing 1700103 2022-03-22 22:42:11.000 2022-03-22 22:39:56.000
3 msg with 3 incoming 1700104 2022-03-22 22:39:56.000 null
4 msg with 3 outgoing 1700104 2022-03-22 22:42:11.000 2022-03-22 22:39:56.000
5 msg with 3 outgoing 1700104 2022-03-22 22:43:11.000 2022-03-22 22:42:11.000

and you are saying in 1700104 messages you want the first sent_at of the sequence, this can be achieved with FIRST_VALUE, but the first row will also have a value:

select 
    *
    ,first_value(sent_at) over (partition by messageable_id order by sent_at) as first_sent_at
from data
order by messageable_id, sent_at;
ID MESSAGE DESTINATION MESSAGEABLE_ID SENT_AT FIRST_SENT_AT
1 msg with 2 incoming 1700103 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000
2 msg with 2 outgoing 1700103 2022-03-22 22:42:11.000 2022-03-22 22:39:56.000
3 msg with 3 incoming 1700104 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000
4 msg with 3 outgoing 1700104 2022-03-22 22:42:11.000 2022-03-22 22:39:56.000
5 msg with 3 outgoing 1700104 2022-03-22 22:43:11.000 2022-03-22 22:39:56.000

So if that first row not being a null is a problem we can solved with NULLIF

select 
    *
    ,first_value(sent_at) over (partition by messageable_id order by sent_at) as first_sent_at
    ,nullif(first_sent_at, sent_at) as nullif_result
from data
order by messageable_id, sent_at;
ID MESSAGE DESTINATION MESSAGEABLE_ID SENT_AT FIRST_SENT_AT NULLIF_RESULT
1 msg with 2 incoming 1700103 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000 null
2 msg with 2 outgoing 1700103 2022-03-22 22:42:11.000 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000
3 msg with 3 incoming 1700104 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000 null
4 msg with 3 outgoing 1700104 2022-03-22 22:42:11.000 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000
5 msg with 3 outgoing 1700104 2022-03-22 22:43:11.000 2022-03-22 22:39:56.000 2022-03-22 22:39:56.000

but if you have multiple messages, it would null all of them. Another option is to use ROW_NUMBER, and to make values stable would be to use id to help order values.

  • Related