Existing Dataframe :
UserId Activity_date Time_stamp
A 27-01-2022 13:00:00
A 27-01-2022 15:00:00
A 27-01-2022 16:00:00
A 28-01-2022 09:00:00
A 28-01-2022 11:00:00
Expected Dataframe :
UserId Activity_date Time_stamp Difference_of_time_stamp
A 27-01-2022 13:00:00 -
A 27-01-2022 15:00:00 02:00:00
A 27-01-2022 16:00:00 01:00:00
A 28-01-2022 09:00:00 -
A 28-01-2022 11:00:00 02:00:00
Difference of time(in milliseconds) needs to be calculated. i am currently using groupby on the UserId and Activity_date so that the difference of time should be calculated for that specific date only.
i know this can be used to get the difference of timestamp but how to groupby on Activity_date
df.assign(Difference_of_time=df.groupby('UserId').Time_stamp.apply(lambda x: x - x.iloc[0]))
CodePudding user response:
Convert Time_stamp
to datetimes with default dates
and then use DataFrameGroupBy.diff
:
df = df.assign(Time_stamp = pd.to_datetime(df['Time_stamp']).dt.floor('H'),
Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
.Time_stamp.diff())
print (df)
UserId Activity_date Time_stamp Difference_of_time
0 A 27-01-2022 2022-09-19 13:00:00 NaT
1 A 27-01-2022 2022-09-19 15:00:00 0 days 02:00:00
2 A 27-01-2022 2022-09-19 16:00:00 0 days 01:00:00
3 A 28-01-2022 2022-09-19 09:00:00 NaT
4 A 28-01-2022 2022-09-19 11:00:00 0 days 02:00:00
Or to timedeltas:
df = df.assign(Time_stamp = pd.to_timedelta(df['Time_stamp']).dt.floor('H'),
Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
.Time_stamp.diff())
print (df)
UserId Activity_date Time_stamp Difference_of_time
0 A 27-01-2022 0 days 13:00:00 NaT
1 A 27-01-2022 0 days 15:00:00 0 days 02:00:00
2 A 27-01-2022 0 days 16:00:00 0 days 01:00:00
3 A 28-01-2022 0 days 09:00:00 NaT
4 A 28-01-2022 0 days 11:00:00 0 days 02:00:00
Or to dates from another column:
df = df.assign(Time_stamp = pd.to_datetime(df['Activity_date'].astype(str) ' '
df['Time_stamp'].astype(str)).dt.floor('H'),
Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
.Time_stamp.diff())
print (df)
UserId Activity_date Time_stamp Difference_of_time
0 A 27-01-2022 2022-01-27 13:00:00 NaT
1 A 27-01-2022 2022-01-27 15:00:00 0 days 02:00:00
2 A 27-01-2022 2022-01-27 16:00:00 0 days 01:00:00
3 A 28-01-2022 2022-01-28 09:00:00 NaT
4 A 28-01-2022 2022-01-28 11:00:00 0 days 02:00:00
For miliseconds use:
df = df.assign(Time_stamp = pd.to_datetime(df['Activity_date'].astype(str) ' '
df['Time_stamp'].astype(str)).dt.floor('H'),
Difference_of_time=lambda x: x.groupby(['UserId','Activity_date'])
.Time_stamp.diff().dt.total_seconds() * 1000)
print (df)
UserId Activity_date Time_stamp Difference_of_time
0 A 27-01-2022 2022-01-27 13:00:00 NaN
1 A 27-01-2022 2022-01-27 15:00:00 7200000.0
2 A 27-01-2022 2022-01-27 16:00:00 3600000.0
3 A 28-01-2022 2022-01-28 09:00:00 NaN
4 A 28-01-2022 2022-01-28 11:00:00 7200000.0