Lets say we have a dataframe, df with four variables as given below. Value in variable D is based on values in variable A, B, and C. For every category level in A, Variable D starts with a value 400 and is reduced by the value in A if the value in B is "NO", else remains same as the previous value.
df
Index A B C D
1 1 5 YES 400
2 1 5 YES 400
3 1 5 NO 395
4 1 5 NO 390
5 1 5 NO 385
6 2 5 NO 395
7 2 5 YES 395
8 2 5 YES 395
9 2 5 NO 390
10 2 5 YES 390
11 3 5 NO 395
12 3 5 NO 390
13 3 5 NO 385
14 3 5 NO 380
15 3 5 YES 380
The formulas for filling variable D is given below:
D1 = 400
D2 = D1=400
D3 = 400-B3
D4 = D3-B4
D5 = D4-B5
D6 = 400-B6
D7 = D6
D8 = D7
D9 = D8-B9
D10 = D9
D11 = 400-B11
D12 = D11-B12
D13 = D12-B13
D14 = D13-B14
D15 = D14
What I have tried?
I have tried using pd.groupby()
and cumsum()
function but working with values in B is the difficult part.
CodePudding user response:
Solution
mask
the values in column B
where corresponding row in column C != 'NO'
, then group the masked column by A
and calculate cumsum
then subtract this cumulative sum from 400
to get the result
m = df['C'].ne('NO')
df['D'] = 400 - df['B'].mask(m, 0).groupby(df['A']).cumsum()
Result
A B C D
0 1 5 YES 400
1 1 5 YES 400
2 1 5 NO 395
3 1 5 NO 390
4 1 5 NO 385
5 2 5 NO 395
6 2 5 YES 395
7 2 5 YES 395
8 2 5 NO 390
9 2 5 YES 390
10 3 5 NO 395
11 3 5 NO 390
12 3 5 NO 385
13 3 5 NO 380
14 3 5 YES 380