Using pandas or numpy in Python fill a fourth column based on values in other three columns-CodePudding

Lets say I have a dataset, df of three variables A,B, and C. I want to compute variable D which is based on values of all A, B, and C. D variable only takes input from variable C but takes into consideration the sequence of values in B for a given level of value in A.

A   B   C   D
1   10  10  390
1   11  7   383
1   12  5   378
1   13  12  366
1   14  16  350
2   50  22  378
2   51  4   374
2   52  9   365
2   53  27  338
2   54  36  302
3   100 81  319
3   101 54  265
3   102 2   263
3   103 11  252
3   104 15  237

The formulas for estimation of each row value of variable D is as follows:

D1 = 400-C2
D2 = 400-C3-C2
D3 = 400-C4-C3-C2
D4 = 400-C5-C4-C3-C2
D5 = 400-C6-C5-C4-C3-C2
D6 = 400-C7
D7 = 400-C8-C7
D8 = 400-C9-C8-C7
D9 = 400-C10-C9-C8-C7
D10 = 400-C11-C10-C9-C8-C7
D11 = 400-C12
D12 = 400-C13-C12
D13 = 400-C14-C13-C12
D14 = 400-C15-C14-C13-C12
D15 = 400-C16-C15-C14-C13-C12

CodePudding user response：

Do a cumulative sum using cumsum and subtract the resulting series from 400.

>>> df['D'] = 400 - df.groupby('A')['C'].cumsum()

    A    B   C    D
0   1   10  10  390
1   1   11   7  383
2   1   12   5  378
3   1   13  12  366
4   1   14  16  350
5   2   50  22  378
6   2   51   4  374
7   2   52   9  365
8   2   53  27  338
9   2   54  36  302
10  3  100  81  319
11  3  101  54  265
12  3  102   2  263
13  3  103  11  252
14  3  104  15  237