Lets say I have a dataset, df of three variables A,B, and C. I want to compute variable D which is based on values of all A, B, and C. D variable only takes input from variable C but takes into consideration the sequence of values in B for a given level of value in A.
A B C D
1 10 10 390
1 11 7 383
1 12 5 378
1 13 12 366
1 14 16 350
2 50 22 378
2 51 4 374
2 52 9 365
2 53 27 338
2 54 36 302
3 100 81 319
3 101 54 265
3 102 2 263
3 103 11 252
3 104 15 237
The formulas for estimation of each row value of variable D is as follows:
D1 = 400-C2
D2 = 400-C3-C2
D3 = 400-C4-C3-C2
D4 = 400-C5-C4-C3-C2
D5 = 400-C6-C5-C4-C3-C2
D6 = 400-C7
D7 = 400-C8-C7
D8 = 400-C9-C8-C7
D9 = 400-C10-C9-C8-C7
D10 = 400-C11-C10-C9-C8-C7
D11 = 400-C12
D12 = 400-C13-C12
D13 = 400-C14-C13-C12
D14 = 400-C15-C14-C13-C12
D15 = 400-C16-C15-C14-C13-C12
CodePudding user response:
Do a cumulative sum using cumsum
and subtract the resulting series from 400
.
>>> df['D'] = 400 - df.groupby('A')['C'].cumsum()
A B C D
0 1 10 10 390
1 1 11 7 383
2 1 12 5 378
3 1 13 12 366
4 1 14 16 350
5 2 50 22 378
6 2 51 4 374
7 2 52 9 365
8 2 53 27 338
9 2 54 36 302
10 3 100 81 319
11 3 101 54 265
12 3 102 2 263
13 3 103 11 252
14 3 104 15 237