Home > Blockchain >  How to calculate the number of specific events per user in dataset in another column?
How to calculate the number of specific events per user in dataset in another column?

Time:06-18

I need to create another column in my dataset with number of previous specific events for each user (in my case number of completed levels)

My data looks like:

event_dttm user_id event_name
2022-04-07 00:00:00.000 1111 complete_level
2022-04-07 00:00:30.000 1111 start_level
2022-04-07 00:04:44.000 1111 complete_level
2022-05-28 09:15:00.000 1111 start_session
2022-05-28 09:17:28.000 1111 start_level
2022-05-28 09:18:46.000 1111 complete_level
2022-05-28 09:19:39.000 1111 start_level
2022-05-28 09:26:16.000 1111 complete_level
2022-04-07 00:00:00.000 1221 start_session
2022-04-07 00:00:30.000 1221 start_level
2022-04-07 00:04:44.000 1221 complete_level
2022-05-28 09:15:00.000 1221 start_level
2022-05-28 09:16:12.000 1221 complete_level
2022-05-28 09:17:28.000 1221 start_level
2022-05-28 09:18:46.000 1221 complete_level
2022-05-28 09:19:39.000 1211 start_level
2022-05-28 09:26:16.000 1221 complete_level

Desired result:

event_dttm user_id event_name completed_levels
2022-04-07 00:00:00.000 1111 complete_level 1
2022-04-07 00:00:30.000 1111 start_level 1
2022-04-07 00:04:44.000 1111 complete_level 2
2022-05-28 09:15:00.000 1111 start_session 2
2022-05-28 09:17:28.000 1111 start_level 2
2022-05-28 09:18:46.000 1111 complete_level 3
2022-05-28 09:19:39.000 1111 start_level 3
2022-05-28 09:26:16.000 1111 complete_level 4
2022-04-07 00:00:00.000 1221 start_session 0
2022-04-07 00:00:30.000 1221 start_level 0
2022-04-07 00:04:44.000 1221 complete_level 1
2022-05-28 09:15:00.000 1221 start_level 1
2022-05-28 09:16:12.000 1221 complete_level 2
2022-05-28 09:17:28.000 1221 start_level 2
2022-05-28 09:18:46.000 1221 complete_level 3
2022-05-28 09:19:39.000 1211 start_level 3
2022-05-28 09:26:16.000 1221 complete_level 4
2022-05-28 09:29:18.000 1211 start_level 4
2022-05-28 09:35:16.000 1221 complete_level 5

I tried to create column with 0/1 if event_name = 'complete_level', count cumulative sum per user and then join it to first table, but it return me an error Aggregate window functions with an ORDER BY clause require a frame clause.

select event_dttm, user_id, event_name, 
sum(completed_level) 
over (partition by event_dttm, user_id order by user_id, event_dttm) as completed_levels
from
(select event_dttm, user_id, event_name 
case
when event_name = 'complete_level' then 1
else 0
end as completed_level
from table)

Also I think that there is simpler solution for my task. Thank you!

CodePudding user response:

select
 *,
 sum(case when event_name = 'complete_level' then 1 else 0 end) over(partition by user_id order by event_dttm asc) as completed_cnt
from your_table

enter image description here

  • Related