Using Pandas fill a fourth column based on conditions on category level values in three other column-CodePudding

Lets say we have a dataframe, df with four variables as given below. Value in variable D is based on values in variable A, B, and C. For every category level in A, Variable D starts with a value 400 and is reduced by the value in A if the value in B is "NO", else remains same as the previous value.

Index   A   B   C   D
1   1   5   YES 400
2   1   5   YES 400
3   1   5   NO  395
4   1   5   NO  390
5   1   5   NO  385
6   2   5   NO  395
7   2   5   YES 395
8   2   5   YES 395
9   2   5   NO  390
10  2   5   YES 390
11  3   5   NO  395
12  3   5   NO  390
13  3   5   NO  385
14  3   5   NO  380
15  3   5   YES 380

The formulas for filling variable D is given below:

D1 = 400
D2 = D1=400
D3 = 400-B3
D4 = D3-B4
D5 = D4-B5
D6 = 400-B6
D7 = D6
D8 = D7
D9 = D8-B9
D10 = D9
D11 = 400-B11
D12 = D11-B12
D13 = D12-B13
D14 = D13-B14
D15 = D14

What I have tried?

I have tried using pd.groupby() and cumsum() function but working with values in B is the difficult part.

CodePudding user response：

Solution

mask the values in column B where corresponding row in column C != 'NO', then group the masked column by A and calculate cumsum then subtract this cumulative sum from 400 to get the result

m = df['C'].ne('NO')
df['D'] = 400 - df['B'].mask(m, 0).groupby(df['A']).cumsum()

Result

    A  B    C    D
0   1  5  YES  400
1   1  5  YES  400
2   1  5   NO  395
3   1  5   NO  390
4   1  5   NO  385
5   2  5   NO  395
6   2  5  YES  395
7   2  5  YES  395
8   2  5   NO  390
9   2  5  YES  390
10  3  5   NO  395
11  3  5   NO  390
12  3  5   NO  385
13  3  5   NO  380
14  3  5  YES  380