Home > Software design >  I am trying to find the time duration within a column with reference to change in activity status of
I am trying to find the time duration within a column with reference to change in activity status of

Time:06-28

Activity Schedule


 Activity   Status  Activity Date
1   Inactive    06/25/22
1   Inactive    06/21/22
1   Active  06/19/22
1   Inactive    06/18/22
2   Active  05/26/22
2   Active  05/23/22
2   Active  05/20/22
2   Inactive    04/14/22
3   Inactive    03/05/22
3   Inactive    02/28/22
3   Inactive    02/23/22
3   Active  02/02/22
3   Active  02/01/22

I want to find out the cumulative time gap of "inactivity" from a recent "active" status grouped by the activity codes. This is what I reached into for now but I need time lag for inactivity from the latest day of activity too.

def diff(x):
    x = x.reset_index(drop=True)
    dif = []
    dif.append(x[0] - x[0])
    dif.extend([x[i] - x[i-1] for i in range(1,len(x))])
    return dif

df['diff'] = df.groupby('Activity')['Activity Date'].transform(diff)

df['Duration'] =df.sort_values(['Activity','Activity Date']).groupby(["Status"])["diff"].transform('cumsum')

I am looking for resluts like this:

Activity    Status Activity Date  Change
1         Inactive      06/18/22    0
1           Active      06/19/22    1
1         Inactive      06/21/22    2
1         Inactive      06/25/22    6
2         Inactive      04/14/22    0
2           Active      05/20/22   36
2           Active      05/23/22    39
3           Active      02/01/22    0
3           Active      02/02/22    1
3         Inactive      02/23/22   22
3         Inactive      02/28/22    27
3         Inactive      03/05/22    32

CodePudding user response:

First, we have to sort the Dataframe values:

df = df.sort_values(["Activity", "Activity Date"])

result:

Activity    Status Activity Date
1         Inactive      06/18/22
1           Active      06/19/22
1         Inactive      06/21/22
1         Inactive      06/25/22
2         Inactive      04/14/22
2           Active      05/20/22
2           Active      05/23/22
2           Active      05/26/22
3           Active      02/01/22
3           Active      02/02/22
3         Inactive      02/23/22
3         Inactive      02/28/22
3         Inactive      03/05/22

Then, creating a function that will return the days difference between dates if belong to the same 'Activity' and have different Status in the previous date, else it will return zero.

def gap(x, y):
    def last_diff_status(df_temp):
        x2 = df_temp.shift().loc[x.name]
        if x['Status'] == x2['Status']:
            return last_diff_status(df_temp.shift())
        return x2

    x1 = y.loc[x.name]
    if x['Activity'] == x1['Activity']:
        if x['Status'] != x1['Status']:
            return (pd.to_datetime(x['Activity_Date']) - pd.to_datetime(x1['Activity_Date'])).days
        else:
            return (pd.to_datetime(x['Activity_Date']) - pd.to_datetime(last_diff_status(y.loc[df['Activity'] == x['Activity']])['Activity_Date'])).days
    else:
        return 0

Creating the new column 'Change' by applying the function with both, current row and the shifted dataframe

df['Change'] = df.apply(lambda x, y=df.shift().fillna(0): gap(x, y), axis=1)

result:

Activity    Status Activity_Date  Change
1         Inactive      06/18/22     0.0
1           Active      06/19/22     1.0
1         Inactive      06/21/22     2.0
1         Inactive      06/25/22     6.0
2         Inactive      04/14/22     0.0
2           Active      05/20/22    36.0
2           Active      05/23/22    39.0
2           Active      05/26/22    42.0
3           Active      02/01/22     0.0
3           Active      02/02/22     0.0
3         Inactive      02/23/22    21.0
3         Inactive      02/28/22    26.0
3         Inactive      03/05/22    31.0
  • Related