I'm using python's pandas.
I'm having the following orders dataframe. when each order have its order id, order time and different items id in the order. in this example I have three different groups - A,B,C:
order_id time item_id
0 A 2022-11-10 08:43:07 1
1 A 2022-11-10 08:43:07 2
2 A 2022-11-10 08:43:07 3
3 B 2022-11-10 08:46:27 1
4 B 2022-11-10 08:46:27 2
5 C 2022-11-10 08:58:45 3
I want to calculate the time difference between group A and B and then between group B and C, by the time order and save the result into another column
wanted result:
order_id time item_id time_diff
0 A 2022-11-10 08:43:07 1
1 A 2022-11-10 08:43:07 2
2 A 2022-11-10 08:43:07 3
3 B 2022-11-10 08:46:27 1 0 days 00:03:20
4 B 2022-11-10 08:46:27 2 0 days 00:03:20
5 C 2022-11-10 08:58:45 3 0 days 00:12:18
how can I calculate the time difference between the groups when the time is similar for the entire group?
try using .diff() but I got only the difference inside the group:
df['time_diff'] = df.groupby('order_id')['time'].diff()
df
Out[141]:
order_id time item_id time_diff
0 A 2022-11-10 08:43:07 1 NaT
1 A 2022-11-10 08:43:07 2 0 days
2 A 2022-11-10 08:43:07 3 0 days
3 B 2022-11-10 08:46:27 1 NaT
4 B 2022-11-10 08:46:27 2 0 days
5 C 2022-11-10 08:58:45 3 NaT
I want the difference between the groups and not inside. I can calculate the difference with .last().diff() but I don't know how to save it as a column back to the dataframe:
df.groupby('order_id')['time'].last().diff().to_frame('time_diff')
Out[]:
time_diff
order_id
A NaT
B 0 days 00:03:20
C 0 days 00:12:18
thanks
CodePudding user response:
You were on the right track. This will work for you:
diff = df.groupby('order_id')['time'].last().diff().to_frame('time_diff').reset_index()
df = df.merge(diff, on='order_id', how='left')
df