I'm working on a problem where if certain conditions are met then I need to add or subtract from the previous row. So I have the following df:
data = {'sample_val':[5.5, 6.2 , 4.0, 7.8, 3.6], 'sample_lab':['nor','high','nor','high','low']}
df=pd.DataFrame(data)
I want to add an additional column 'adjustment', that is filled by the following rules and has a starting value of 10:
- if sample_val < 4 then - 6 from the previous adjustment entry
- if 4<= sample_val <= 5 then 0 to the previous adjustment entry
- if sample_val>5 then add 4 to the previous adjustment entry
I've tried iterrows( )
, but I can't get it to add or subtract. I also tried np.where
, but can't get it to iterate.
CodePudding user response:
If I understood correctly, you can use np.select
to know which coefficients you must add or subtract, and then accumulate the result to the initial value of 10
using panda's cumsum
:
import numpy as np
x = df.sample_val.to_numpy()
adj = np.select([x<4, (x>=4)&(x<=5), x>5], [-6,0,4], x)
df['adjustment'] = 0
df.loc[0, 'adjustment'] = 10
df['adjustment'] = df.adjustment.add(adj).cumsum()
print(df)
sample_val sample_lab adjustment
0 5.5 nor 14.0
1 6.2 high 18.0
2 4.0 nor 18.0
3 7.8 high 22.0
4 3.6 low 16.0
CodePudding user response:
Use pandas.cut
to assign numbers to bins and cumsum
10 for the cumulated sum starting at 10:
df['adjustment'] = (pd.cut(df['sample_val'],
bins=[-np.inf, 4, 5, np.inf],
labels=[-6, 0, 4])
.astype(int)
.cumsum()
) 10
output:
sample_val sample_lab adjustment
0 5.5 nor 14
1 6.2 high 18
2 4.0 nor 12
3 7.8 high 16
4 3.6 low 10
CodePudding user response:
There are other approaches that provide concise solutions however as you stated that you could not get this to work with itterows
, this solution shows you how it can be done. This approach is also useful if you need to add additional logic or perform other calculations.
You can use Pandas iterrows
and loc
functions. See the Pandas documentation for iterrows and loc to learn more.
The following assumes you are starting with a value of 10. You are then applying the rules before setting the value of the adjustment
column.
# add the additional column
data_df['adjustment'] = 0
# set the initial adjustment
adjustment = 10
# adjust the values of the column
for index, row in data_df.iterrows():
sample_val = float(row['sample_val'])
if sample_val < 4:
adjustment -= 6
elif sample_val > 5:
adjustment = 4
# set the column value
data_df.loc[index, 'adjustment'] = adjustment
print(data_df)
This would produce the following output for your example:
sample_val sample_lab adjustment
0 5.5 nor 14
1 6.2 high 18
2 4.0 nor 18
3 7.8 high 22
4 3.6 low 16
Alternatively, if you require the first row to have a value of 10 in the adjustment
column then you will need to adjust the code as follows:
# add the additional column
data_df['adjustment'] = 0
# set the initial adjustment
adjustment = 10
# adjust the values of the column
for index, row in data_df.iterrows():
sample_val = float(row['sample_val'])
if index == 0:
pass
elif sample_val < 4:
adjustment -= 6
elif sample_val > 5:
adjustment = 4
# set the column value
data_df.loc[index, 'adjustment'] = adjustment
print(data_df)
This would produce the following result:
sample_val sample_lab adjustment
0 5.5 nor 10
1 6.2 high 14
2 4.0 nor 14
3 7.8 high 18
4 3.6 low 12