I have a dataframe with timestamps, which I have used DataFrame.rolling() to find occurrences within a small time window (10 seconds), following the accepted answer to this question.
However, I would like to add a new column with unique group ids for all the rows which occur within the same time window.
After running the accepted answer, my dataframe looks like this:
timestamp user_id count
2021-01-08 10:00:01 1 1
2021-01-08 10:00:02 2 2
2021-01-08 10:00:03 3 3
2021-01-08 10:00:09 1 4
2021-01-08 11:00:01 1 1
2021-01-08 11:00:02 7 2
2021-01-08 11:00:11 3 1
My desired output would be something like this:
timestamp user_id count window_group
2021-01-08 10:00:01 1 1 1
2021-01-08 10:00:02 2 2 1
2021-01-08 10:00:03 3 3 1
2021-01-08 10:00:09 1 4 1
2021-01-08 11:00:01 1 1 2
2021-01-08 11:00:02 7 2 2
2021-01-08 11:00:11 3 1 3
CodePudding user response:
You can identify the reset in number with diff
and boolean not (~
), then cumsum
:
df['window_group'] = (~df['count'].diff().gt(0)).cumsum()
output:
timestamp user_id count window_group
0 2021-01-08 10:00:01 1 1 1
1 2021-01-08 10:00:02 2 2 1
2 2021-01-08 10:00:03 3 3 1
3 2021-01-08 10:00:09 1 4 1
4 2021-01-08 11:00:01 1 1 2
5 2021-01-08 11:00:02 7 2 2
6 2021-01-08 11:00:11 3 1 3