Home > Mobile >  How to vectorize a pandas dataframe calculation where if a conditional is not met the data from the
How to vectorize a pandas dataframe calculation where if a conditional is not met the data from the

Time:07-20

Currently I am using a for loop with a conditional that when true performs a calculation and inputs it into the column of a dataframe. However, when the conditional is not met the data from the previous row is entered into the new row.

This is a pseudocode of what I currently have:


for index in range(len(dataframe[column1])):
  if condition==True:
    dataframe.at[index, column3]= dataframe.at[index, column1]-dataframe.at[index, column2]
  else:
    dataframe.at[index, column3]= dataframe.at[index-1, column3]
 

I understand that when the calculation of the current row depends on the previous row, vectorization usually is not viable. However in this case, since the calculation for column 3 does not depend on the previous row and I am simply inputting the previous row's value into the current row, would it be possible to vectorize this to improve runtime speed?

CodePudding user response:

You could do that in a vectorized way like this.

Starting Data

   c0  c1  c2
0   5   2   4
1   5  10   6
2   9   3   2
3   1   4   2
4   4   2   7
5   1   5   8
6   3   4   6
7  10   1   3
8   4   2   6
9   3   1   2

Execute

import numpy as np

dfc = df.assign(c3=np.where(df['c0']>2, df['c1']-df['c2'], np.nan)).ffill().fillna(0).astype(int)
print(dfc)

Result

   c0  c1  c2  c3
0   5   2   4  -2
1   5  10   6   4
2   9   3   2   1
3   1   4   2   1
4   4   2   7  -5
5   1   5   8  -5
6   3   4   6  -2
7  10   1   3  -2
8   4   2   6  -4
9   3   1   2  -1

This leverages Numpy's where function to do the selection. If the condition is true then it does the subtraction. If not true it places an NA value temporarily into the cell. Then ffill does a forward fill of values which completes the logic of placing the previous value of the column into a cell if the condition is not true. Note that fillna(0) place a zero in the first row if the condition is not met in the first row - since it does not have a previous value to place.

  • Related