Home > Mobile >  find time difference between groups in dataframe with python
find time difference between groups in dataframe with python

Time:12-12

I'm using python's pandas.

I'm having the following orders dataframe. when each order have its order id, order time and different items id in the order. in this example I have three different groups - A,B,C:

  order_id                 time  item_id
0        A  2022-11-10 08:43:07        1
1        A  2022-11-10 08:43:07        2
2        A  2022-11-10 08:43:07        3
3        B  2022-11-10 08:46:27        1
4        B  2022-11-10 08:46:27        2
5        C  2022-11-10 08:58:45        3

I want to calculate the time difference between group A and B and then between group B and C, by the time order and save the result into another column

wanted result:

 order_id                 time  item_id        time_diff
0        A  2022-11-10 08:43:07        1                 
1        A  2022-11-10 08:43:07        2                 
2        A  2022-11-10 08:43:07        3                 
3        B  2022-11-10 08:46:27        1  0 days 00:03:20
4        B  2022-11-10 08:46:27        2  0 days 00:03:20
5        C  2022-11-10 08:58:45        3  0 days 00:12:18

how can I calculate the time difference between the groups when the time is similar for the entire group?

try using .diff() but I got only the difference inside the group:

df['time_diff'] = df.groupby('order_id')['time'].diff()

df
Out[141]: 
  order_id                time  item_id time_diff
0        A 2022-11-10 08:43:07        1       NaT
1        A 2022-11-10 08:43:07        2    0 days
2        A 2022-11-10 08:43:07        3    0 days
3        B 2022-11-10 08:46:27        1       NaT
4        B 2022-11-10 08:46:27        2    0 days
5        C 2022-11-10 08:58:45        3       NaT

I want the difference between the groups and not inside. I can calculate the difference with .last().diff() but I don't know how to save it as a column back to the dataframe:

df.groupby('order_id')['time'].last().diff().to_frame('time_diff')
Out[]: 
               time_diff
order_id                
A                    NaT
B        0 days 00:03:20
C        0 days 00:12:18

thanks

CodePudding user response:

You were on the right track. This will work for you:

diff = df.groupby('order_id')['time'].last().diff().to_frame('time_diff').reset_index()
df = df.merge(diff, on='order_id', how='left')
df
  • Related