Home > Back-end >  Can I Select Rows After a Row With a Certain Value For Every Partition?
Can I Select Rows After a Row With a Certain Value For Every Partition?

Time:10-27

I'm trying to detect users first sessions.

ID event_server_date Event Row_number
1 2022-10-26 09:43 abc 1
1 2022-10-26 09:45 cde 2
1 2022-10-26 09:47 ykz 3
1 2022-10-26 09:48 fun 4
1 2022-10-26 09:50 start_event 5
1 2022-10-26 09:55 x 6
1 2022-10-26 09:56 y 7
1 2022-10-26 09:56 z 8
2 2022-10-26 09:12 plz 1
2 2022-10-26 09:15 rck 2
2 2022-10-26 09:15 dsp 3
2 2022-10-26 09:17 vnl 4
2 2022-10-26 09:23 start_event 5
2 2022-10-26 09:23 k 6
2 2022-10-26 09:26 l 7

Desired Output:

ID Timestamp Event Row_number
1 2022-10-26 09:50 start_event 5
1 2022-10-26 09:55 x 6
1 2022-10-26 09:56 y 7
1 2022-10-26 09:56 z 8
2 2022-10-26 09:23 start_event 5
2 2022-10-26 09:23 k 6
2 2022-10-26 09:26 l 7

Real timestamp column looks like this: 1970-01-20 06:57:25.583738 UTC I'm using event based data and my table is quite large.

Is there any way for me pick those desired rows only? And discard all events before start_event in every partition.

I've got this far but I have no idea how to discard unwanted events for every partition.

SELECT ID, event_server_date , Event,
row_number() over(partition by ID ORDER BY event_server_date ASC) AS Row_number

FROM `my_table` 

ORDER BY event_server_date ASC

Note: I have been using SQL for only two months so I might not know the concepts you're talking about.

CodePudding user response:

We can use COUNT() as an analytic function along with a rolling sum:

WITH cte AS (
    SELECT *, COUNT(CASE WHEN Event = 'start_event' THEN 1 END)
                  OVER (PARTITION BY ID ORDER BY event_server_date) cnt,
              ROW_NUMBER() OVER (PARTITION BY makro_id
                                 ORDER BY event_server_date) AS Row_number
    FROM my_table
)

SELECT ID, event_server_date, Event, Row_number
FROM cte
WHERE cnt > 0
ORDER BY ID, event_server_date;

The count generated above will be one or greater once a row is encountered having start_event. We only retain rows late enough for this condition to be true.

CodePudding user response:

I would do it with LAST_VALUE() analytic function and QUALIFY

SELECT * 
FROM my_table
QUALIFY LAST_VALUE(IF(event = 'start_event', TRUE, NULL) IGNORE NULLS) 
             OVER (PARTITION BY ID ORDER BY event_server_date 
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

You can ignore ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW because it's default window in analytic functions but I like it to leave it for the sake of clarity.

  • Related