I have a dataset, where when the sum of Q1 24 - Q4 24 is between the number 1 - 2.5, I would like to place the number 2 in that row under Q4 24.
Data
ID type Q1 24 Q2 24 Q3 24 Q4 24
AA hi 2.0 1.2 0.5 0.6
AA hello 0.7 2.0 0.6 0.6
AA bye 0.6 0.6 0.6 0.4
AA ok 0.3 0.4 0.2 0.2
Desired
ID type Q1 24 Q2 24 Q3 24 Q4 24
AA hi 2.0 1.2 0.5 0.6
AA hello 0.7 2.0 0.6 0.6
AA bye 0.0 0.0 0.0 2.0
AA ok 0.0 0.0 0.0 2.0
Doing
df.loc[df.iloc[:,2:].sum(axis=1)>1<2.5, ['Q1 24','Q2 24','Q3 24','Q4 24']]= 2
A SO member helped with the above script, but how would I only target that row under Q4 24. I am thinking I can utilize iloc again for this. Any suggestion is appreciated.
CodePudding user response:
This will do what you are after including zeroing the other columns:
df.loc[df.filter(regex=r'^Q\d').sum(axis=1).between(1, 2.5), ['Q1 24','Q2 24','Q3 24','Q4 24']] = 0, 0, 0, 2
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0 0 0 2.0
3 AA ok 0 0 0 2.0
CodePudding user response:
As an alternative:
import numpy as np
df['Q4 24']=np.where((df[df.columns[2:]].sum(axis=1)<=2.5) & (df[df.columns[2:]].sum(axis=1)>=1),2,df['Q4 24'])
print(df)
'''
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.6 0.6 0.6 2.0
3 AA ok 0.3 0.4 0.2 2.0
'''
CodePudding user response:
You were on the right track with boolean indexing.
I would use:
df.loc[df.filter(regex=r'^Q\d').sum(axis=1).between(1, 2.5), 'Q4 24'] = 2
Output:
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.6 0.6 0.6 2.0
3 AA ok 0.3 0.4 0.2 2.0
adding 2 to last column and setting all the others to 0
sel = df.filter(regex=r'^Q\d')
df.loc[sel.sum(axis=1).between(1, 2.5), sel.columns] = [0]*(sel.shape[1]-1) [2]
Output:
ID type Q1 24 Q2 24 Q3 24 Q4 24
0 AA hi 2.0 1.2 0.5 0.6
1 AA hello 0.7 2.0 0.6 0.6
2 AA bye 0.0 0.0 0.0 2.0
3 AA ok 0.0 0.0 0.0 2.0