Home > OS >  Rearrange values in dataframe based on condition in Pandas
Rearrange values in dataframe based on condition in Pandas

Time:11-18

I have a dataset, where when the sum of Q1 24 - Q4 24 is between the number 1 - 2.5, I would like to place the number 2 in that row under Q4 24.

Data

ID  type    Q1 24   Q2 24   Q3 24   Q4 24
AA  hi      2.0     1.2     0.5     0.6
AA  hello   0.7     2.0     0.6     0.6
AA  bye     0.6     0.6     0.6     0.4
AA  ok      0.3     0.4     0.2     0.2
                

Desired

ID  type    Q1 24   Q2 24   Q3 24   Q4 24
AA  hi      2.0     1.2     0.5     0.6
AA  hello   0.7     2.0     0.6     0.6
AA  bye     0.0     0.0     0.0     2.0
AA  ok      0.0     0.0     0.0     2.0

Doing

df.loc[df.iloc[:,2:].sum(axis=1)>1<2.5, ['Q1 24','Q2 24','Q3 24','Q4 24']]= 2

A SO member helped with the above script, but how would I only target that row under Q4 24. I am thinking I can utilize iloc again for this. Any suggestion is appreciated.

CodePudding user response:

This will do what you are after including zeroing the other columns:

df.loc[df.filter(regex=r'^Q\d').sum(axis=1).between(1, 2.5), ['Q1 24','Q2 24','Q3 24','Q4 24']] = 0, 0, 0, 2

   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye      0      0      0    2.0
3  AA     ok      0      0      0    2.0

CodePudding user response:

As an alternative:

import numpy as np
df['Q4 24']=np.where((df[df.columns[2:]].sum(axis=1)<=2.5) & (df[df.columns[2:]].sum(axis=1)>=1),2,df['Q4 24'])
print(df)
'''
   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye    0.6    0.6    0.6    2.0
3  AA     ok    0.3    0.4    0.2    2.0
'''

CodePudding user response:

You were on the right track with boolean indexing.

I would use:

df.loc[df.filter(regex=r'^Q\d').sum(axis=1).between(1, 2.5), 'Q4 24'] = 2

Output:

   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye    0.6    0.6    0.6    2.0
3  AA     ok    0.3    0.4    0.2    2.0

adding 2 to last column and setting all the others to 0

sel = df.filter(regex=r'^Q\d')

df.loc[sel.sum(axis=1).between(1, 2.5), sel.columns] = [0]*(sel.shape[1]-1) [2]

Output:

   ID   type  Q1 24  Q2 24  Q3 24  Q4 24
0  AA     hi    2.0    1.2    0.5    0.6
1  AA  hello    0.7    2.0    0.6    0.6
2  AA    bye    0.0    0.0    0.0    2.0
3  AA     ok    0.0    0.0    0.0    2.0
  • Related