Say we have a pandas dataframe like the below:
df = pd.DataFrame({"Basis": [300, 1500, 400, 260, 50,-10],"Weights":[0,-1,0,0,0,0]})
print(df)
Basis Weights
0 300 0
1 1500 -1
2 400 0
3 260 0
4 50 0
5 -10 0
So I found out how I can set value of X column within row X based on values within another column of that same row. So in this dataframe I get that I can set all weights to -1 where Basis > 1000
df.loc[df['Basis'] > 1000, 'Weights'] = -1
What I want to be able to do is: in a large df of this format, take all the rows in between a row where there is a weight of -1 and a later row where basis <= 0 and set their weight value to -1 (so in the image case, I want to set rows 1-4's weights value to -1, and I have to work out how to do this without looping through the entire dataframe (have to work with a very large dataset).
The desired output would be:
Basis Weights
0 300 0
1 1500 -1
2 400 -1
3 260 -1
4 50 -1
5 -10 0
is there an elegant way to do this that avoids looping through entire df? I.e. Some quick way of implementing condition that weight equals previous weight if basis >=0
CodePudding user response:
If you only have >0 or -1 values in Weights, you can set up groups starting at negative Basis and get the cummin
Weight:
group = df['Basis'].lt(0).cumsum()
df['Weights'] = df.groupby(group)['Weights'].cummin()
If you have arbitrary values, this is a bit more complex, you first need to mask the non -1 values, ffill
per group, then restore the other values:
group = df['Basis'].lt(0).cumsum()
df['Weights'] = (df['Weights']
.where(df['Weights'].eq(-1))
.groupby(group).ffill()
.fillna(df['Weights'], downcast='infer')
)
output:
Basis Weights
0 300 0
1 1500 -1
2 400 -1
3 260 -1
4 50 -1
5 -10 0
CodePudding user response:
You can replace the 0
s with nan
and then fillna
-
df.loc[(df['Basis'] > 0) & (df['Weights'] >= 0), 'Weights'] = np.nan
df = df.fillna(method='ffill').fillna(0)
Output
Basis Weights
0 300 0.0
1 1500 -1.0
2 400 -1.0
3 260 -1.0
4 50 -1.0
5 -10 0.0