Home > Net >  Group datetime column based on the time difference
Group datetime column based on the time difference

Time:12-06

Name        Time
John        2022-12-06 02:37:25
John        2022-12-06 10:37:25
John        2022-12-06 10:42:34            
John        2022-12-06 10:47:12
John        2022-12-06 10:52:10

Above data shows datetime of john clicking a remote button. So if he is clicking the remote every 10mins I want to consider that “John” has been watching TV continuously as if in a single session.

So in above data set row 0 and row 1 time difference is in hours(that is greater than 10mins) . So I want to consider this as a separate session.

When you see the time difference between row1, row2 and row2,row3 and row3,row4 it is less than 10minutes. So I want to consider all these rows as a single session.

My desired output is

Name     Aggregated_timelist
John     [‘2022-12-06 10:37:25’,  2022-12-06 10:42:34’, ‘2022-12-06 10:47:12’, ‘2022-12-06 10:52:10’]
John     [‘2022-12-06 02:37:25’]

CodePudding user response:

You can use a custom groupby.agg as list setting a new group when the difference is greater than 10 minutes:

group = pd.to_datetime(df['Time']).diff().gt('10min').cumsum()
# if you have several users:
# group = pd.to_datetime(df['Time']).groupby(df['Name']).diff().gt('10min').cumsum()

out = df.groupby(['Name', group], as_index=False)['Time'].agg(list)

Output:

   Name                                               Time
0  John                              [2022-12-06 02:37:25]
1  John  [2022-12-06 10:37:25, 2022-12-06 10:42:34, 202...
  • Related