Home > Net >  Python how to groupby and track changes across several columns
Python how to groupby and track changes across several columns

Time:06-30

I have a dataframe that looks like so:

Contract_Id Contract Value Expiry Date
1 15 05-12-1997
1 33 05-12-1997
2 21 12-12-2001
2 21 19-11-2002
3 9 24-07-2010
3 8 29-08-2010
4 98 01-01-2000
4 98 01-01-2000

For each contract, I would like to track whether or not it has had a contract value change, expiry date change, both, or neither; and create a new column based off these results. I would like my new dataframe to look something like this:

Contract_Id Contract Value Expiry Date Change
1 15 05-12-1997 Contract Value Change
1 33 05-12-1997 Contract Value Change
2 21 12-12-2001 Expiry Date Change
2 21 19-11-2002 Expiry Date Change
3 9 24-07-2010 Both
3 8 29-08-2010 Both
4 98 01-01-2000 Neither
4 98 01-01-2000 Neither

Any help with this would be greatly appreciated.

Thanks!

CodePudding user response:

You can try groupby and compare values in each group

def compare(g):
    m1 = g['Contract Value'].iloc[0] != g['Contract Value'].iloc[1]
    m2 = g['Expiry Date'].iloc[0] != g['Expiry Date'].iloc[1]

    res = ''
    if m1 and m2:
        res = 'Both'
    elif (not m1) and (not m2):
        res = 'Neither'
    elif m1:
        res= 'Contract Value Change'
    elif m2:
        res = 'Expiry Date Change'

    g = g.assign(Change=[res]*2)
    return g

out = df.groupby('Contract_Id').apply(compare)
print(out)

   Contract_Id  Contract Value Expiry Date                 Change
0            1              15  05-12-1997  Contract Value Change
1            1              33  05-12-1997  Contract Value Change
2            2              21  12-12-2001     Expiry Date Change
3            2              21  19-11-2002     Expiry Date Change
4            3               9  24-07-2010                   Both
5            3               8  29-08-2010                   Both
6            4              98  01-01-2000                Neither
7            4              98  01-01-2000                Neither

CodePudding user response:

One way to do it using for loop:

for i in range(0,len(df)-1,2):
    if df.loc[i,'Contract Value Change']!=df.loc[i 1,'Contract Value Change']:
        if df.loc[i,'Expiry Date']!=df.loc[i 1,'Expiry Date']:
            df.loc[i,'Change'] = 'Both'
            df.loc[i 1,'Change'] = 'Both'
        else:
            df.loc[i,'Change'] = 'Contract Value Change'
            df.loc[i 1,'Change'] = 'Contract Value Change'
    elif df.loc[i,'Expiry Date']!=df.loc[i 1,'Expiry Date']:
            df.loc[i,'Change'] = 'Expiry Date Change'
            df.loc[i 1,'Change'] = 'Expiry Date Change'
    else:
            df.loc[i,'Change'] = 'Neither'
            df.loc[i 1,'Change'] = 'Neither'
  • Related