Name Time
John 2022-12-06 02:37:25
John 2022-12-06 10:37:25
John 2022-12-06 10:42:34
John 2022-12-06 10:47:12
John 2022-12-06 10:52:10
Above data shows datetime of john clicking a remote button. So if he is clicking the remote every 10mins I want to consider that “John” has been watching TV continuously as if in a single session.
So in above data set row 0 and row 1 time difference is in hours(that is greater than 10mins) . So I want to consider this as a separate session.
When you see the time difference between row1, row2 and row2,row3 and row3,row4 it is less than 10minutes. So I want to consider all these rows as a single session.
My desired output is
Name Aggregated_timelist
John [‘2022-12-06 10:37:25’, 2022-12-06 10:42:34’, ‘2022-12-06 10:47:12’, ‘2022-12-06 10:52:10’]
John [‘2022-12-06 02:37:25’]
CodePudding user response:
You can use a custom groupby.agg
as list setting a new group when the difference is greater than 10 minutes:
group = pd.to_datetime(df['Time']).diff().gt('10min').cumsum()
# if you have several users:
# group = pd.to_datetime(df['Time']).groupby(df['Name']).diff().gt('10min').cumsum()
out = df.groupby(['Name', group], as_index=False)['Time'].agg(list)
Output:
Name Time
0 John [2022-12-06 02:37:25]
1 John [2022-12-06 10:37:25, 2022-12-06 10:42:34, 202...