Home > Blockchain >  timestamp difference with multiple groupby in dataframe
timestamp difference with multiple groupby in dataframe

Time:09-20

Existing Dataframe :

UserId     Activity_date     Time_stamp
A            27-01-2022        13:00:00
A            27-01-2022        15:00:00
A            27-01-2022        16:00:00
A            28-01-2022        09:00:00
A            28-01-2022        11:00:00

Expected Dataframe :

UserId     Activity_date     Time_stamp        Difference_of_time_stamp
A            27-01-2022        13:00:00             -
A            27-01-2022        15:00:00          02:00:00  
A            27-01-2022        16:00:00          01:00:00
A            28-01-2022        09:00:00             -
A            28-01-2022        11:00:00          02:00:00

Difference of time(in milliseconds) needs to be calculated. i am currently using groupby on the UserId and Activity_date so that the difference of time should be calculated for that specific date only.

i know this can be used to get the difference of timestamp but how to groupby on Activity_date

df.assign(Difference_of_time=df.groupby('UserId').Time_stamp.apply(lambda x: x - x.iloc[0]))

CodePudding user response:

Convert Time_stamp to datetimes with default dates and then use DataFrameGroupBy.diff:

df = df.assign(Time_stamp = pd.to_datetime(df['Time_stamp']).dt.floor('H'),
               Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
                                              .Time_stamp.diff())

print (df)
  UserId Activity_date          Time_stamp Difference_of_time
0      A    27-01-2022 2022-09-19 13:00:00                NaT
1      A    27-01-2022 2022-09-19 15:00:00    0 days 02:00:00
2      A    27-01-2022 2022-09-19 16:00:00    0 days 01:00:00
3      A    28-01-2022 2022-09-19 09:00:00                NaT
4      A    28-01-2022 2022-09-19 11:00:00    0 days 02:00:00

Or to timedeltas:

df = df.assign(Time_stamp = pd.to_timedelta(df['Time_stamp']).dt.floor('H'),
               Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
                                             .Time_stamp.diff())

print (df)
  UserId Activity_date      Time_stamp Difference_of_time
0      A    27-01-2022 0 days 13:00:00                NaT
1      A    27-01-2022 0 days 15:00:00    0 days 02:00:00
2      A    27-01-2022 0 days 16:00:00    0 days 01:00:00
3      A    28-01-2022 0 days 09:00:00                NaT
4      A    28-01-2022 0 days 11:00:00    0 days 02:00:00

Or to dates from another column:

df = df.assign(Time_stamp = pd.to_datetime(df['Activity_date'].astype(str)   ' '   
           df['Time_stamp'].astype(str)).dt.floor('H'),
               Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
                                             .Time_stamp.diff())

print (df)
  UserId Activity_date          Time_stamp Difference_of_time
0      A    27-01-2022 2022-01-27 13:00:00                NaT
1      A    27-01-2022 2022-01-27 15:00:00    0 days 02:00:00
2      A    27-01-2022 2022-01-27 16:00:00    0 days 01:00:00
3      A    28-01-2022 2022-01-28 09:00:00                NaT
4      A    28-01-2022 2022-01-28 11:00:00    0 days 02:00:00

For miliseconds use:

df = df.assign(Time_stamp = pd.to_datetime(df['Activity_date'].astype(str)   ' '   
           df['Time_stamp'].astype(str)).dt.floor('H'),
               Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
                                             .Time_stamp.diff().dt.total_seconds() * 1000)

print (df)

  UserId Activity_date          Time_stamp  Difference_of_time
0      A    27-01-2022 2022-01-27 13:00:00                 NaN
1      A    27-01-2022 2022-01-27 15:00:00           7200000.0
2      A    27-01-2022 2022-01-27 16:00:00           3600000.0
3      A    28-01-2022 2022-01-28 09:00:00                 NaN
4      A    28-01-2022 2022-01-28 11:00:00           7200000.0
  • Related