Return label of the column which has the maximum value if it is higher than a certain number-CodePudding

I have a pandas dataframe with 18 columns, the columns from 2:17 hold numerical values with labels 2-17. I want to add a new column ('category') to the dataframe to include the label of the column which holds the maximum value if that value is higher than a certain number (0.5). I used the below code after utilizing a related question on #Pandas (python): max in columns define new value in new column

  test_df['Category'] = test_df.values[:,2:18].argmax(1) if (test_df.values[:,2:18].max(1) >=0.5) else 'none'

when I run the above code, I get the error message "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", any advice is appreciated.

CodePudding user response：

Use mask to hide row where no values are greater than 0.5 then extract the column name with idxmax.

test_df['Category'] = (
    test_df.iloc[:, 2:18].mask(~test_df.iloc[:, 2:18].ge(0.5).any(1), other=pd.NA)
           .idxmax(1).fillna('none')
)

Output:

>>> test_df
       A     B      col0      col1      col2      col3      col4      col5      col6      col7      col8      col9     col10     col11     col12     col13     col14     col15 Category
0  alpha  beta  0.004926  0.262662  0.059676  0.026302  0.360741  0.256310  0.472451  0.340764  0.472086  0.379545  0.437554  0.435562  0.438726  0.503708  0.193708  0.260441    col13
1  alpha  beta  0.178689  0.326015  0.514489  0.050754  0.391687  0.153947  0.157198  0.396039  0.009823  0.275651  0.454966  0.204654  0.111680  0.250095  0.297196  0.183910     col2
2  alpha  beta  0.513636  0.019905  0.418037  0.188359  0.393665  0.481321  0.196138  0.507833  0.042835  0.022343  0.156208  0.191423  0.258029  0.351852  0.354429  0.301106     col0
3  alpha  beta  0.042417  0.472806  0.020204  0.411681  0.019293  0.140623  0.108013  0.136260  0.490850  0.004276  0.212354  0.470580  0.107478  0.011461  0.366930  0.100539     none
4  alpha  beta  0.287596  0.317487  0.520418  0.048656  0.033835  0.519975  0.237952  0.449427  0.211287  0.204319  0.465478  0.343144  0.408842  0.050700  0.508506  0.169613     col2
5  alpha  beta  0.154299  0.369279  0.188567  0.083248  0.004332  0.436839  0.329708  0.458652  0.329425  0.068757  0.238121  0.102302  0.497296  0.219990  0.041560  0.455992     none
6  alpha  beta  0.361315  0.010401  0.039190  0.013335  0.236268  0.268206  0.407632  0.149902  0.041239  0.481870  0.338411  0.430095  0.335075  0.405940  0.108008  0.393766     none
7  alpha  beta  0.087221  0.146438  0.202438  0.132151  0.126972  0.442878  0.011017  0.471010  0.229542  0.139989  0.476444  0.294640  0.057385  0.369218  0.433006  0.388687     none
8  alpha  beta  0.519944  0.083779  0.520042  0.382255  0.315839  0.083350  0.038597  0.114521  0.160149  0.424750  0.192139  0.457243  0.054164  0.252922  0.133647  0.240518     col2
9  alpha  beta  0.246072  0.172740  0.002171  0.046882  0.132817  0.365952  0.252149  0.516113  0.050739  0.054263  0.331012  0.493549  0.433013  0.484111  0.118902  0.475689     col7

Setup MRE

np.random.seed(2022)
test_df = pd.DataFrame(np.random.random((10, 16))) / 1.9
test_df.columns = 'col'   test_df.columns.astype(str)
tmp_df = pd.DataFrame({'A': 'alpha', 'B': 'beta'}, index=test_df.index)
test_df = pd.concat([tmp_df, test_df], axis=1)