Activity Schedule
Activity Status Activity Date
1 Inactive 06/25/22
1 Inactive 06/21/22
1 Active 06/19/22
1 Inactive 06/18/22
2 Active 05/26/22
2 Active 05/23/22
2 Active 05/20/22
2 Inactive 04/14/22
3 Inactive 03/05/22
3 Inactive 02/28/22
3 Inactive 02/23/22
3 Active 02/02/22
3 Active 02/01/22
I want to find out the cumulative time gap of "inactivity" from a recent "active" status grouped by the activity codes. This is what I reached into for now but I need time lag for inactivity from the latest day of activity too.
def diff(x):
x = x.reset_index(drop=True)
dif = []
dif.append(x[0] - x[0])
dif.extend([x[i] - x[i-1] for i in range(1,len(x))])
return dif
df['diff'] = df.groupby('Activity')['Activity Date'].transform(diff)
df['Duration'] =df.sort_values(['Activity','Activity Date']).groupby(["Status"])["diff"].transform('cumsum')
I am looking for resluts like this:
Activity Status Activity Date Change
1 Inactive 06/18/22 0
1 Active 06/19/22 1
1 Inactive 06/21/22 2
1 Inactive 06/25/22 6
2 Inactive 04/14/22 0
2 Active 05/20/22 36
2 Active 05/23/22 39
3 Active 02/01/22 0
3 Active 02/02/22 1
3 Inactive 02/23/22 22
3 Inactive 02/28/22 27
3 Inactive 03/05/22 32
CodePudding user response:
First, we have to sort the Dataframe values:
df = df.sort_values(["Activity", "Activity Date"])
result:
Activity Status Activity Date
1 Inactive 06/18/22
1 Active 06/19/22
1 Inactive 06/21/22
1 Inactive 06/25/22
2 Inactive 04/14/22
2 Active 05/20/22
2 Active 05/23/22
2 Active 05/26/22
3 Active 02/01/22
3 Active 02/02/22
3 Inactive 02/23/22
3 Inactive 02/28/22
3 Inactive 03/05/22
Then, creating a function that will return the days difference between dates if belong to the same 'Activity' and have different Status in the previous date, else it will return zero.
def gap(x, y):
def last_diff_status(df_temp):
x2 = df_temp.shift().loc[x.name]
if x['Status'] == x2['Status']:
return last_diff_status(df_temp.shift())
return x2
x1 = y.loc[x.name]
if x['Activity'] == x1['Activity']:
if x['Status'] != x1['Status']:
return (pd.to_datetime(x['Activity_Date']) - pd.to_datetime(x1['Activity_Date'])).days
else:
return (pd.to_datetime(x['Activity_Date']) - pd.to_datetime(last_diff_status(y.loc[df['Activity'] == x['Activity']])['Activity_Date'])).days
else:
return 0
Creating the new column 'Change' by applying the function with both, current row and the shifted dataframe
df['Change'] = df.apply(lambda x, y=df.shift().fillna(0): gap(x, y), axis=1)
result:
Activity Status Activity_Date Change
1 Inactive 06/18/22 0.0
1 Active 06/19/22 1.0
1 Inactive 06/21/22 2.0
1 Inactive 06/25/22 6.0
2 Inactive 04/14/22 0.0
2 Active 05/20/22 36.0
2 Active 05/23/22 39.0
2 Active 05/26/22 42.0
3 Active 02/01/22 0.0
3 Active 02/02/22 0.0
3 Inactive 02/23/22 21.0
3 Inactive 02/28/22 26.0
3 Inactive 03/05/22 31.0