I have a table on Snowflake that contains messages between company admins and users. Here's what the table looks like.
id | message | destination | messageable_id | sent_at |
---|---|---|---|---|
1 | Hello Customer! | outgoing | 1700103 | 2022-03-22 22:42:11.000 |
2 | Hello Company! | incoming | 1700103 | 2022-03-22 22:39:56.000 |
I have been trying to get the response time by using lag(sent_at,1) over (partition by messageable_id order by sent_at)
to get the sent_at value from the previous row, and calculating the datediff there as the response time.
However, I realized that there are records where I have 3 consecutive outgoing rows, and it would make more sense for me to get the earliest sent_at value in that group rather than the latest one.
I'm wondering if it would be possible to implement a condition on the offset in the lag() syntax. Something along the lines of IF 3 consecutive outgoing values in column, then offset = 3 else 1.
So far, I've looked into using window functions but no luck there.
CodePudding user response:
here is one way :
select *,lag(sent_at,1) over (partition by messageable_id order by sent_at)
from (
select * , row_number() over (partition by messageable_id , destination order by sent_at asc) rn
from tablename
) t where rn = 1
CodePudding user response:
So there are a couple of points, first, you messages by ID the id's are traveling back in time, and you have no three message example data, and not expected best case outcomes results:
so lets make some data that makes more sense:
with data(id, message, destination, messageable_id, sent_at) as (
select * from values
(1, 'msg with 2', 'incoming', 1700103, '2022-03-22 22:39:56.000'::timestamp_ntz),
(2, 'msg with 2', 'outgoing', 1700103, '2022-03-22 22:42:11.000'::timestamp_ntz),
(3, 'msg with 3', 'incoming', 1700104, '2022-03-22 22:39:56.000'::timestamp_ntz),
(4, 'msg with 3', 'outgoing', 1700104, '2022-03-22 22:42:11.000'::timestamp_ntz),
(5, 'msg with 3', 'outgoing', 1700104, '2022-03-22 22:43:11.000'::timestamp_ntz)
)
now using your "what I tried SQL"
select
*
,lag(sent_at,1) over (partition by messageable_id order by sent_at) as val_sent_at
from data
order by messageable_id, sent_at;
ID | MESSAGE | DESTINATION | MESSAGEABLE_ID | SENT_AT | VAL_SENT_AT |
---|---|---|---|---|---|
1 | msg with 2 | incoming | 1700103 | 2022-03-22 22:39:56.000 | null |
2 | msg with 2 | outgoing | 1700103 | 2022-03-22 22:42:11.000 | 2022-03-22 22:39:56.000 |
3 | msg with 3 | incoming | 1700104 | 2022-03-22 22:39:56.000 | null |
4 | msg with 3 | outgoing | 1700104 | 2022-03-22 22:42:11.000 | 2022-03-22 22:39:56.000 |
5 | msg with 3 | outgoing | 1700104 | 2022-03-22 22:43:11.000 | 2022-03-22 22:42:11.000 |
and you are saying in 1700104
messages you want the first sent_at
of the sequence, this can be achieved with FIRST_VALUE, but the first row will also have a value:
select
*
,first_value(sent_at) over (partition by messageable_id order by sent_at) as first_sent_at
from data
order by messageable_id, sent_at;
ID | MESSAGE | DESTINATION | MESSAGEABLE_ID | SENT_AT | FIRST_SENT_AT |
---|---|---|---|---|---|
1 | msg with 2 | incoming | 1700103 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 |
2 | msg with 2 | outgoing | 1700103 | 2022-03-22 22:42:11.000 | 2022-03-22 22:39:56.000 |
3 | msg with 3 | incoming | 1700104 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 |
4 | msg with 3 | outgoing | 1700104 | 2022-03-22 22:42:11.000 | 2022-03-22 22:39:56.000 |
5 | msg with 3 | outgoing | 1700104 | 2022-03-22 22:43:11.000 | 2022-03-22 22:39:56.000 |
So if that first row not being a null is a problem we can solved with NULLIF
select
*
,first_value(sent_at) over (partition by messageable_id order by sent_at) as first_sent_at
,nullif(first_sent_at, sent_at) as nullif_result
from data
order by messageable_id, sent_at;
ID | MESSAGE | DESTINATION | MESSAGEABLE_ID | SENT_AT | FIRST_SENT_AT | NULLIF_RESULT |
---|---|---|---|---|---|---|
1 | msg with 2 | incoming | 1700103 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 | null |
2 | msg with 2 | outgoing | 1700103 | 2022-03-22 22:42:11.000 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 |
3 | msg with 3 | incoming | 1700104 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 | null |
4 | msg with 3 | outgoing | 1700104 | 2022-03-22 22:42:11.000 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 |
5 | msg with 3 | outgoing | 1700104 | 2022-03-22 22:43:11.000 | 2022-03-22 22:39:56.000 | 2022-03-22 22:39:56.000 |
but if you have multiple messages, it would null all of them. Another option is to use ROW_NUMBER, and to make values stable would be to use id to help order values.