I need to create another column in my dataset with number of previous specific events for each user (in my case number of completed levels)
My data looks like:
event_dttm | user_id | event_name |
---|---|---|
2022-04-07 00:00:00.000 | 1111 | complete_level |
2022-04-07 00:00:30.000 | 1111 | start_level |
2022-04-07 00:04:44.000 | 1111 | complete_level |
2022-05-28 09:15:00.000 | 1111 | start_session |
2022-05-28 09:17:28.000 | 1111 | start_level |
2022-05-28 09:18:46.000 | 1111 | complete_level |
2022-05-28 09:19:39.000 | 1111 | start_level |
2022-05-28 09:26:16.000 | 1111 | complete_level |
2022-04-07 00:00:00.000 | 1221 | start_session |
2022-04-07 00:00:30.000 | 1221 | start_level |
2022-04-07 00:04:44.000 | 1221 | complete_level |
2022-05-28 09:15:00.000 | 1221 | start_level |
2022-05-28 09:16:12.000 | 1221 | complete_level |
2022-05-28 09:17:28.000 | 1221 | start_level |
2022-05-28 09:18:46.000 | 1221 | complete_level |
2022-05-28 09:19:39.000 | 1211 | start_level |
2022-05-28 09:26:16.000 | 1221 | complete_level |
Desired result:
event_dttm | user_id | event_name | completed_levels |
---|---|---|---|
2022-04-07 00:00:00.000 | 1111 | complete_level | 1 |
2022-04-07 00:00:30.000 | 1111 | start_level | 1 |
2022-04-07 00:04:44.000 | 1111 | complete_level | 2 |
2022-05-28 09:15:00.000 | 1111 | start_session | 2 |
2022-05-28 09:17:28.000 | 1111 | start_level | 2 |
2022-05-28 09:18:46.000 | 1111 | complete_level | 3 |
2022-05-28 09:19:39.000 | 1111 | start_level | 3 |
2022-05-28 09:26:16.000 | 1111 | complete_level | 4 |
2022-04-07 00:00:00.000 | 1221 | start_session | 0 |
2022-04-07 00:00:30.000 | 1221 | start_level | 0 |
2022-04-07 00:04:44.000 | 1221 | complete_level | 1 |
2022-05-28 09:15:00.000 | 1221 | start_level | 1 |
2022-05-28 09:16:12.000 | 1221 | complete_level | 2 |
2022-05-28 09:17:28.000 | 1221 | start_level | 2 |
2022-05-28 09:18:46.000 | 1221 | complete_level | 3 |
2022-05-28 09:19:39.000 | 1211 | start_level | 3 |
2022-05-28 09:26:16.000 | 1221 | complete_level | 4 |
2022-05-28 09:29:18.000 | 1211 | start_level | 4 |
2022-05-28 09:35:16.000 | 1221 | complete_level | 5 |
I tried to create column with 0/1 if event_name = 'complete_level', count cumulative sum per user and then join it to first table, but it return me an error Aggregate window functions with an ORDER BY clause require a frame clause
.
select event_dttm, user_id, event_name,
sum(completed_level)
over (partition by event_dttm, user_id order by user_id, event_dttm) as completed_levels
from
(select event_dttm, user_id, event_name
case
when event_name = 'complete_level' then 1
else 0
end as completed_level
from table)
Also I think that there is simpler solution for my task. Thank you!
CodePudding user response:
select
*,
sum(case when event_name = 'complete_level' then 1 else 0 end) over(partition by user_id order by event_dttm asc) as completed_cnt
from your_table