We have a dataframe,df with four variables A, B,C, and D.
Variable A has two levels 1,2, and 3 (in this example only).
Variable B, C and D are continuous variables.
Formula used for filling column C based on A and B is
df['C'] = 150 - df['B'].groupby(df['A']).cumsum()
Desired result is in column D
Basically, a value in column C cannot take a value >150 and <0. For instance, in index 24, Column C with 163.5>150 is replaced with 150 in Column D. The values in subsequent rows changes. Again, in index 28, Column C takes a value 150-180=-30<0; thus, replaced with 0 in Column D and the values in subsequent rows changes.
df
ID A B C D
0 1 21 129 129
1 1 -1.5 130.5 130.5
2 1 -1.5 132 132
3 1 13.5 118.5 118.5
4 1 13.5 105 105
5 1 13.5 91.5 91.5
6 2 21 129 129
7 2 -1.5 130.5 130.5
8 2 6 124.5 124.5
9 2 13.5 111 111
10 2 13.5 97.5 97.5
11 2 13.5 84 84
12 2 13.5 70.5 70.5
13 2 -9 79.5 79.5
14 2 6 73.5 73.5
15 2 -9 82.5 82.5
16 2 6 76.5 76.5
17 2 -1.5 78 78
18 2 13.5 64.5 64.5
19 2 -1.5 66 66
20 2 13.5 52.5 52.5
21 2 13.5 39 39
22 2 -106.5 145.5 145.5
23 2 6 139.5 139.5
24 2 -24 163.5 150
25 2 6 157.5 144
26 2 13.5 144 130.5
27 2 13.5 130.5 117
28 3 180 -30 0
29 3 -9 -21 9
30 3 6 -27 3
31 3 -1.5 -25.5 4.5
32 3 13.5 -39 0
33 3 -1.5 -37.5 1.5
34 3 13.5 -51 0
35 3 -24 -27 24
NOTE
Please see the changes between Column C and D from index no. 24.
Formula used to calculate values in column D from index no. 24 to 35 is as given below:
ID formula
24 163.5>150 (SET TO 150)
25 150-6=144
26 144-13.5=130.5
27 130.5-13.5=117
28 150-180=-30 (SET TO 0)
29 0-(-9)=9
30 9-6=3
31 3-(-1.5)=4.5
32 4.5-13.5=-9 (SET TO 0)
33 0-(-1.5)=1.5
34 1.5-13.5=-12 (SET TO 0)
35 0-(-24)=24
CodePudding user response:
import pandas as pd
qqq = []
def func_data(x):
aaa = 150
for i in x:
aaa -=i
if aaa > 150:
aaa =150
if aaa < 0:
aaa = 0
qqq.append(aaa)
df['F'] = df.groupby(['A'])['B'].apply(func_data)
df['F'] = qqq
print(df)
Output
ID A B C D F
0 0 1 21.0 129.0 129.0 129.0
1 1 1 -1.5 130.5 130.5 130.5
2 2 1 -1.5 132.0 132.0 132.0
3 3 1 13.5 118.5 118.5 118.5
4 4 1 13.5 105.0 105.0 105.0
5 5 1 13.5 91.5 91.5 91.5
6 6 2 21.0 129.0 129.0 129.0
7 7 2 -1.5 130.5 130.5 130.5
8 8 2 6.0 124.5 124.5 124.5
9 9 2 13.5 111.0 111.0 111.0
10 10 2 13.5 97.5 97.5 97.5
11 11 2 13.5 84.0 84.0 84.0
12 12 2 13.5 70.5 70.5 70.5
13 13 2 -9.0 79.5 79.5 79.5
14 14 2 6.0 73.5 73.5 73.5
15 15 2 -9.0 82.5 82.5 82.5
16 16 2 6.0 76.5 76.5 76.5
17 17 2 -1.5 78.0 78.0 78.0
18 18 2 13.5 64.5 64.5 64.5
19 19 2 -1.5 66.0 66.0 66.0
20 20 2 13.5 52.5 52.5 52.5
21 21 2 13.5 39.0 39.0 39.0
22 22 2 -106.5 145.5 145.5 145.5
23 23 2 6.0 139.5 139.5 139.5
24 24 2 -24.0 163.5 150.0 150.0
25 25 2 6.0 157.5 144.0 144.0
26 26 2 13.5 144.0 130.5 130.5
27 27 2 13.5 130.5 117.0 117.0
28 28 3 180.0 -30.0 0.0 0.0
29 29 3 -9.0 -21.0 9.0 9.0
30 30 3 6.0 -27.0 3.0 3.0
31 31 3 -1.5 -25.5 4.5 4.5
32 32 3 13.5 -39.0 0.0 0.0
33 33 3 -1.5 -37.5 1.5 1.5
34 34 3 13.5 -51.0 0.0 0.0
35 35 3 -24.0 -27.0 24.0 24.0
Apply has a func_data function to test conditions and set values. The result is an array qqq, set to column F.
CodePudding user response:
I'm not sure I understood the question correctly, as it is not clear why in index no. 24 you referred to column C or why in index no. 28 the calculation is 150-180=-30 and not 117-180=-63.
I can't think of a way to solve this without using for loop.
With the help of this answer, this should work:
df2.loc[0, 'D'] = df2.loc[0, 'C'] # this set the first value in column D, now it's the same value as in column C.
for i in range(1, len(df2)):
val = df2.loc[i-1, 'D'] - df2.loc[i,'B']
if val > 150:
val = 150
if val < 0:
val = 0
df2.loc[i, 'D'] = val