Currently I am using a for loop with a conditional that when true performs a calculation and inputs it into the column of a dataframe. However, when the conditional is not met the data from the previous row is entered into the new row.
This is a pseudocode of what I currently have:
for index in range(len(dataframe[column1])):
if condition==True:
dataframe.at[index, column3]= dataframe.at[index, column1]-dataframe.at[index, column2]
else:
dataframe.at[index, column3]= dataframe.at[index-1, column3]
I understand that when the calculation of the current row depends on the previous row, vectorization usually is not viable. However in this case, since the calculation for column 3 does not depend on the previous row and I am simply inputting the previous row's value into the current row, would it be possible to vectorize this to improve runtime speed?
CodePudding user response:
You could do that in a vectorized way like this.
Starting Data
c0 c1 c2
0 5 2 4
1 5 10 6
2 9 3 2
3 1 4 2
4 4 2 7
5 1 5 8
6 3 4 6
7 10 1 3
8 4 2 6
9 3 1 2
Execute
import numpy as np
dfc = df.assign(c3=np.where(df['c0']>2, df['c1']-df['c2'], np.nan)).ffill().fillna(0).astype(int)
print(dfc)
Result
c0 c1 c2 c3
0 5 2 4 -2
1 5 10 6 4
2 9 3 2 1
3 1 4 2 1
4 4 2 7 -5
5 1 5 8 -5
6 3 4 6 -2
7 10 1 3 -2
8 4 2 6 -4
9 3 1 2 -1
This leverages Numpy's where
function to do the selection. If the condition is true then it does the subtraction. If not true it places an NA
value temporarily into the cell. Then ffill
does a forward fill of values which completes the logic of placing the previous value of the column into a cell if the condition is not true. Note that fillna(0)
place a zero in the first row if the condition is not met in the first row - since it does not have a previous value to place.