Home > Blockchain >  Pandas: Set values in all rows that are between two rows in a dataframe?
Pandas: Set values in all rows that are between two rows in a dataframe?

Time:06-17

Say we have a pandas dataframe like the below:

df = pd.DataFrame({"Basis": [300, 1500, 400, 260, 50,-10],"Weights":[0,-1,0,0,0,0]})
print(df)

   Basis  Weights
0    300        0
1   1500       -1
2    400        0
3    260        0
4     50        0
5    -10        0

So I found out how I can set value of X column within row X based on values within another column of that same row. So in this dataframe I get that I can set all weights to -1 where Basis > 1000

df.loc[df['Basis'] > 1000, 'Weights'] = -1

What I want to be able to do is: in a large df of this format, take all the rows in between a row where there is a weight of -1 and a later row where basis <= 0 and set their weight value to -1 (so in the image case, I want to set rows 1-4's weights value to -1, and I have to work out how to do this without looping through the entire dataframe (have to work with a very large dataset).

The desired output would be:

   Basis  Weights
0    300        0
1   1500       -1
2    400       -1
3    260       -1
4     50       -1
5    -10        0

is there an elegant way to do this that avoids looping through entire df? I.e. Some quick way of implementing condition that weight equals previous weight if basis >=0

CodePudding user response:

If you only have >0 or -1 values in Weights, you can set up groups starting at negative Basis and get the cummin Weight:

group = df['Basis'].lt(0).cumsum()
df['Weights'] = df.groupby(group)['Weights'].cummin()

If you have arbitrary values, this is a bit more complex, you first need to mask the non -1 values, ffill per group, then restore the other values:

group = df['Basis'].lt(0).cumsum()
df['Weights'] = (df['Weights']
                 .where(df['Weights'].eq(-1))
                 .groupby(group).ffill()
                 .fillna(df['Weights'], downcast='infer')
                )

output:

   Basis  Weights
0    300        0
1   1500       -1
2    400       -1
3    260       -1
4     50       -1
5    -10        0

CodePudding user response:

You can replace the 0s with nan and then fillna -

df.loc[(df['Basis'] > 0) & (df['Weights'] >= 0), 'Weights'] = np.nan
df = df.fillna(method='ffill').fillna(0) 

Output

   Basis  Weights
0    300      0.0
1   1500     -1.0
2    400     -1.0
3    260     -1.0
4     50     -1.0
5    -10      0.0
  • Related