Home > Software design >  Time interval calculation for consecutive days in rows
Time interval calculation for consecutive days in rows

Time:11-25

I have a dataframe that looks like this:

   Path_Version commitdates Year-Month         API Age     api_spec_id 
168          NaN  2018-10-19    2018-10             39             521   
169          NaN  2018-10-19    2018-10             39             521  
170          NaN  2018-10-12    2018-10             39             521
171          NaN  2018-10-12    2018-10             39             521  
172          NaN  2018-10-12    2018-10             39             521  
173          NaN  2018-10-11    2018-10             39             521  
174          NaN  2018-10-11    2018-10             39             521  
175          NaN  2018-10-11    2018-10             39             521  
176          NaN  2018-10-11    2018-10             39             521  
177          NaN  2018-10-11    2018-10             39             521  
178          NaN  2018-09-26    2018-09             39             521  
179          NaN  2018-09-25    2018-09             39             521 

I want to calculate the days elapsed from the first commitdate till the last, after sorting the commit dates first, so something like this:

   Path_Version commitdates Year-Month         API Age     api_spec_id   Days_difference
168          NaN  2018-10-19    2018-10             39             521     25
169          NaN  2018-10-19    2018-10             39             521     25
170          NaN  2018-10-12    2018-10             39             521     18
171          NaN  2018-10-12    2018-10             39             521     18
172          NaN  2018-10-12    2018-10             39             521     18
173          NaN  2018-10-11    2018-10             39             521     16
174          NaN  2018-10-11    2018-10             39             521     16
175          NaN  2018-10-11    2018-10             39             521     16
176          NaN  2018-10-11    2018-10             39             521     16
177          NaN  2018-10-11    2018-10             39             521     16
178          NaN  2018-09-26    2018-09             39             521     1
179          NaN  2018-09-25    2018-09             39             521     0

I tried first sorting the commitdates by api_spec_id since it is unique for every API, and then calculating the diff

final_api['commitdates'] = final_api.groupby('api_spec_id')['commitdate'].apply(lambda x: x.sort_values())

final_api['diff'] = final_api.groupby('api_spec_id')['commitdates'].diff() / np.timedelta64(1, 'D')
final_api['diff'] = final_api['diff'].fillna(0)

It just returns me a zero for the entire column. I don't want to group them, I only want to calculate the difference based on the sorted commitdates: starting from the first commitdate till the last in the entire dataset, in days

Any idea how can I achieve this?

CodePudding user response:

Use pandas.to_datetime, sub, min and dt.days:

t = pd.to_datetime(df['commitdates'])

df['Days_difference'] = t.sub(t.min()).dt.days

If you need to group per API:

t = pd.to_datetime(df['commitdates'])

df['Days_difference'] = t.sub(t.groupby(df['api_spec_id']).transform('min')).dt.days

Output:

     Path_Version commitdates Year-Month  API Age  api_spec_id  Days_difference
168           NaN  2018-10-19    2018-10       39          521               24
169           NaN  2018-10-19    2018-10       39          521               24
170           NaN  2018-10-12    2018-10       39          521               17
171           NaN  2018-10-12    2018-10       39          521               17
172           NaN  2018-10-12    2018-10       39          521               17
173           NaN  2018-10-11    2018-10       39          521               16
174           NaN  2018-10-11    2018-10       39          521               16
175           NaN  2018-10-11    2018-10       39          521               16
176           NaN  2018-10-11    2018-10       39          521               16
177           NaN  2018-10-11    2018-10       39          521               16
178           NaN  2018-09-26    2018-09       39          521                1
179           NaN  2018-09-25    2018-09       39          521                0
  • Related