Time interval calculation between commits-CodePudding

I have a dataframe that looks like this:

      commitdates api_spec_id/
0      2021-04-07  84
1      2021-05-31  84
2      2021-06-21  84
3      2021-06-18  84
4      2020-12-06  124

        commits          commitDate                         
0           32       2021-04-07 12:52:56            
1           32       2021-05-31 03:12:37             
2           32       2021-06-21 06:50:33              
3           32       2021-06-18 05:11:23             
4           37       2020-12-06 20:35:45

I want to calculate the time interval elapsed between the first commit and last commit, api_spec_id corresponds to the relevant API, each API has different commits, hence I want to find the first and last and calculate the interval between them.

My desired output is :

       api_spec_id  Age (in days)
0        84          89
1        84          89
2        84          89
3        84          67
4        124         56

I tried doing the following after scanning through similar posts here on stack:

gb = final_api.groupby('api_spec_id')['commitDate']
(gb.max() - gb.min()) / pd.Timedelta(days=1)

and got the following output:

api_spec_id
84         74.748345
124        22.486979
164       124.080359
184       921.732488
214        11.994167
             ...    
224530      1.987951
224606      8.221690
224613     67.541366
224627    151.838333
224665    657.721481

And another method:

s = final_api.groupby(['api_spec_id','commitdates'])['timestamp'].agg(['min','max']); s['max']-s['min']

which returned me 0 days.

I am not sure if this is correct, and also I would like to append this result to a new dataframe column, but not sure how to do this. Any help would be appreciated.

CodePudding user response：

Use groupby.transform with min/max (or first/last if you really want the order, not values to matter)):

# pre-requisite
df[['commitdates', 'commitDate']] = df[['commitdates', 'commitDate']].apply(pd.to_datetime)

g = df.groupby('api_spec_id')['commitdates']
df['Age (in days)'] = g.transform('max').sub(g.transform('min'))

Output:

  commitdates  api_spec_id  commits          commitDate Age (in days)
0  2021-04-07           84       32 2021-04-07 12:52:56       75 days
1  2021-05-31           84       32 2021-05-31 03:12:37       75 days
2  2021-06-21           84       32 2021-06-21 06:50:33       75 days
3  2021-06-18           84       32 2021-06-18 05:11:23       75 days
4  2020-12-06          124       37 2020-12-06 20:35:45        0 days