Home > database >  Rank multiple group of records over multiple columns
Rank multiple group of records over multiple columns

Time:09-30

The title is a bit confusing so the easiest think is to describe my problem using an example.

Assuming I'm having this dataset:

Username Event_time Is_new_session
userA 2022-09-30 00:00:01.000000 True
userA 2022-09-30 00:00:02.000000 False
userA 2022-09-30 01:00:00.000000 True
userA 2022-09-30 02:00:00.000000 True
userA 2022-09-30 02:00:02.000000 False
userA 2022-09-30 02:00:04.000000 False
userA 2022-09-30 00:00:05.000000 False
userB 2022-09-30 03:00:00.000000 True

At first I want to achieve something like this:

Username Event_time Is_new_session rnk
userA 2022-09-30 00:00:01.000000 True 1
userA 2022-09-30 00:00:02.000000 False 1
userA 2022-09-30 01:00:00.000000 True 2
userA 2022-09-30 02:00:00.000000 True 3
userA 2022-09-30 02:00:02.000000 False 3
userA 2022-09-30 02:00:04.000000 False 3
userA 2022-09-30 02:00:05.000000 False 3
userB 2022-09-30 03:00:00.000000 True 4

So I want to add the same rnk value (probably rank is not the correct term) to all the records of the group until the next record has Is_new_session = True, in order to apply some aggregate functions similar and produce something like this:

SessiondID Username First_Event_time Last_Event_time Event_counts
1 userA 2022-09-30 00:00:01.000000 2022-09-30 00:00:02.000000 2
2 userA 2022-09-30 01:00:00.000000 2022-09-30 01:00:00.000000 1
3 userA 2022-09-30 02:00:00.000000 2022-09-30 02:00:05.000000 4
4 userB 2022-09-30 03:00:00.000000 2022-09-30 03:00:00.000000 1

Do you have any ideas on window functions that could help achieve this "ranking" (results shown on table no.2 )? T-SQL won't really work in my case. Any other recommendations for going from the 1st table to the 3rd without the need of the 2nd are of course very welcome.

CodePudding user response:

We use case when to mark every time event is true and then run a windw_function count() to divide the table into groups.

select   session_id 
        ,Username
        ,min(Event_time) as First_Event_time    
        ,max(Event_time) as Last_Event_time
        ,count(*)        as Event_counts
from     (
         select *
                ,count(case Is_new_session when 'True' then 1 end) over(order by Event_time) as session_id
         from   t
         ) t
group by Username, session_id
session_id Username First_Event_time Last_Event_time Event_counts
1 userA 2022-09-30 00:00:01.000 2022-09-30 00:00:02.000 2
2 userA 2022-09-30 01:00:00.000 2022-09-30 01:00:00.000 1
3 userA 2022-09-30 02:00:00.000 2022-09-30 02:00:05.000 4
4 userB 2022-09-30 03:00:00.000 2022-09-30 03:00:00.000 1

Fiddle

  • Related