Home > Enterprise >  Use pandas.groupby() and cumsum() with row wise condition check and replacement
Use pandas.groupby() and cumsum() with row wise condition check and replacement

Time:06-10

We have a dataframe,df with four variables A, B,C, and D.

Variable A has two levels 1,2, and 3 (in this example only).

Variable B, C and D are continuous variables.

Formula used for filling column C based on A and B is

df['C'] = 150 - df['B'].groupby(df['A']).cumsum()

Desired result is in column D

Basically, a value in column C cannot take a value >150 and <0. For instance, in index 24, Column C with 163.5>150 is replaced with 150 in Column D. The values in subsequent rows changes. Again, in index 28, Column C takes a value 150-180=-30<0; thus, replaced with 0 in Column D and the values in subsequent rows changes.

df

ID      A       B       C       D
0       1       21      129     129
1       1       -1.5    130.5   130.5
2       1       -1.5    132     132
3       1       13.5    118.5   118.5
4       1       13.5    105     105
5       1       13.5    91.5    91.5
6       2       21      129     129
7       2       -1.5    130.5   130.5
8       2       6       124.5   124.5
9       2       13.5    111     111
10      2       13.5    97.5    97.5
11      2       13.5    84      84
12      2       13.5    70.5    70.5
13      2       -9      79.5    79.5
14      2       6       73.5    73.5
15      2       -9      82.5    82.5
16      2       6       76.5    76.5
17      2       -1.5    78      78
18      2       13.5    64.5    64.5
19      2       -1.5    66      66
20      2       13.5    52.5    52.5
21      2       13.5    39      39
22      2       -106.5  145.5   145.5
23      2       6       139.5   139.5
24      2       -24     163.5   150
25      2       6       157.5   144
26      2       13.5    144     130.5
27      2       13.5    130.5   117
28      3       180     -30     0
29      3       -9      -21     9
30      3       6       -27     3
31      3       -1.5    -25.5   4.5
32      3       13.5    -39     0
33      3       -1.5    -37.5   1.5
34      3       13.5    -51     0
35      3       -24     -27     24

NOTE

Please see the changes between Column C and D from index no. 24.

Formula used to calculate values in column D from index no. 24 to 35 is as given below:

ID  formula
24  163.5>150 (SET TO 150)
25  150-6=144
26  144-13.5=130.5
27  130.5-13.5=117
28  150-180=-30 (SET TO 0)
29  0-(-9)=9
30  9-6=3
31  3-(-1.5)=4.5
32  4.5-13.5=-9 (SET TO 0)
33  0-(-1.5)=1.5
34  1.5-13.5=-12 (SET TO 0)
35  0-(-24)=24

CodePudding user response:

import pandas as pd

qqq = []
def func_data(x):
    aaa = 150
    for i in x:
        aaa -=i
        if aaa > 150:
            aaa =150
        if aaa < 0:
            aaa = 0
        qqq.append(aaa)


df['F'] = df.groupby(['A'])['B'].apply(func_data)
df['F'] = qqq
print(df)

Output

    ID  A      B      C      D      F
0    0  1   21.0  129.0  129.0  129.0
1    1  1   -1.5  130.5  130.5  130.5
2    2  1   -1.5  132.0  132.0  132.0
3    3  1   13.5  118.5  118.5  118.5
4    4  1   13.5  105.0  105.0  105.0
5    5  1   13.5   91.5   91.5   91.5
6    6  2   21.0  129.0  129.0  129.0
7    7  2   -1.5  130.5  130.5  130.5
8    8  2    6.0  124.5  124.5  124.5
9    9  2   13.5  111.0  111.0  111.0
10  10  2   13.5   97.5   97.5   97.5
11  11  2   13.5   84.0   84.0   84.0
12  12  2   13.5   70.5   70.5   70.5
13  13  2   -9.0   79.5   79.5   79.5
14  14  2    6.0   73.5   73.5   73.5
15  15  2   -9.0   82.5   82.5   82.5
16  16  2    6.0   76.5   76.5   76.5
17  17  2   -1.5   78.0   78.0   78.0
18  18  2   13.5   64.5   64.5   64.5
19  19  2   -1.5   66.0   66.0   66.0
20  20  2   13.5   52.5   52.5   52.5
21  21  2   13.5   39.0   39.0   39.0
22  22  2 -106.5  145.5  145.5  145.5
23  23  2    6.0  139.5  139.5  139.5
24  24  2  -24.0  163.5  150.0  150.0
25  25  2    6.0  157.5  144.0  144.0
26  26  2   13.5  144.0  130.5  130.5
27  27  2   13.5  130.5  117.0  117.0
28  28  3  180.0  -30.0    0.0    0.0
29  29  3   -9.0  -21.0    9.0    9.0
30  30  3    6.0  -27.0    3.0    3.0
31  31  3   -1.5  -25.5    4.5    4.5
32  32  3   13.5  -39.0    0.0    0.0
33  33  3   -1.5  -37.5    1.5    1.5
34  34  3   13.5  -51.0    0.0    0.0
35  35  3  -24.0  -27.0   24.0   24.0

Apply has a func_data function to test conditions and set values. The result is an array qqq, set to column F.

CodePudding user response:

I'm not sure I understood the question correctly, as it is not clear why in index no. 24 you referred to column C or why in index no. 28 the calculation is 150-180=-30 and not 117-180=-63.

I can't think of a way to solve this without using for loop.

With the help of this answer, this should work:

df2.loc[0, 'D'] = df2.loc[0, 'C']  # this set the first value in column D, now it's the same value as in column C.
for i in range(1, len(df2)):
    val = df2.loc[i-1, 'D'] - df2.loc[i,'B']
    if val > 150:
        val = 150
    if val < 0:
        val = 0
    df2.loc[i, 'D'] = val
  • Related