Home > Software engineering >  Creating a pandas column that performs a calculation on another column, but changes the calculation
Creating a pandas column that performs a calculation on another column, but changes the calculation

Time:07-21

I currently a dataframe that contains a column called load, and I want to create a column called calculated load, that uses a simple calculation on the column load, and a variable. However, I want the calculation to change when it sees the value 1 in a column called postition, and uses that formula until it sees -1 in position, when the values start to rise again. Here is my current code:

import pandas as pd

x = 2

df = pd.DataFrame({"load": [1,2,4,6,2,4,7,4,8,3,4,7,3,3,6,4,7,4,3,2],
                   "position": [0,0.2,0.5,0.8,0.7,1,0.7,0.6,0.7,0.8,0.4,0.2,0,-0.5,-0.8,-1,-0.8,-0.9,-0.7,-0.6]})

df['calculated load'] = df['load']   x

print(df['calculated load'])

0      3
1      4
2      6
3      8
4      4
5      6
6      9
7      6
8     10
9      5
10     6
11     9
12     5
13     5
14     8
15     6
16     9
17     6
18     5
19     4

This works up to the position after 1, when the values start falling, I want to use this formula that subtracts x, and then reverts back to the original formula adding x from the position after -1 is seen again:

df['calculated load'] = df['load'] - x

I can't just check if the value after is less than or more than the previous value, as the values in position don't rise and fall perfectly. Ideally, this would be my desired output:

print(df['calculated load'])

0      3
1      4
2      6
3      8
4      4
5      6
6      5
7      1
8      6
9      1
10     2
11     5
12     1
13     1
14     4
15     2
16     9
17     6
18     5
19     2

CodePudding user response:

I believe this code is working, but it's not efficient because of itterrows(). If someone find a way to vectorize it you can comment my answer.

import pandas as pd

x = 2

df = pd.DataFrame({"load": [1,2,4,6,2,4,7,4,8,3,4,7,3,3,6,4,7,4,3,2],
                   "position": [0,0.2,0.5,0.8,0.7,1,0.7,0.6,0.7,0.8,0.4,0.2,0,-0.5,-0.8,-1,-0.8,-0.9,-0.7,-0.6]})

increasing = True
list_increasing = []
for index, row in df.iterrows():
    if increasing and row.position == 1:
        increasing = False
    elif not increasing and row.position == -1:
        increasing = True
    list_increasing.append(increasing)
    
df['increasing'] = list_increasing

def calculated_load(row):
    if row.increasing:
        return row.load   x
    else:
        return row.load - x

df['cal load'] = df.apply(calculated_load, axis=1)

CodePudding user response:

Without loop, you can use:

x1 = df['position'].eq(1).mul(-x).shift(fill_value=0)
x2 = df['position'].eq(-1).mul(x)
xm = (p1|p2).replace(0, np.nan).ffill().fillna(x).astype(int)

df['calculated load'] = df['load']   xm

Output:

>>> df
    load  position  calculated load
0      1       0.0                3
1      2       0.2                4
2      4       0.5                6
3      6       0.8                8
4      2       0.7                4
5      4       1.0                6
6      7       0.7                5
7      4       0.6                2
8      8       0.7                6
9      3       0.8                1
10     4       0.4                2
11     7       0.2                5
12     3       0.0                1
13     3      -0.5                1
14     6      -0.8                4
15     4      -1.0                6
16     7      -0.8                9
17     4      -0.9                6
18     3      -0.7                5
19     2      -0.6                4
  • Related