I have a pandas dataframe with 18 columns, the columns from 2:17 hold numerical values with labels 2-17. I want to add a new column ('category') to the dataframe to include the label of the column which holds the maximum value if that value is higher than a certain number (0.5). I used the below code after utilizing a related question on #Pandas (python): max in columns define new value in new column
test_df['Category'] = test_df.values[:,2:18].argmax(1) if (test_df.values[:,2:18].max(1) >=0.5) else 'none'
when I run the above code, I get the error message "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", any advice is appreciated.
CodePudding user response:
Use mask
to hide row where no values are greater than 0.5 then extract the column name with idxmax
.
test_df['Category'] = (
test_df.iloc[:, 2:18].mask(~test_df.iloc[:, 2:18].ge(0.5).any(1), other=pd.NA)
.idxmax(1).fillna('none')
)
Output:
>>> test_df
A B col0 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 col13 col14 col15 Category
0 alpha beta 0.004926 0.262662 0.059676 0.026302 0.360741 0.256310 0.472451 0.340764 0.472086 0.379545 0.437554 0.435562 0.438726 0.503708 0.193708 0.260441 col13
1 alpha beta 0.178689 0.326015 0.514489 0.050754 0.391687 0.153947 0.157198 0.396039 0.009823 0.275651 0.454966 0.204654 0.111680 0.250095 0.297196 0.183910 col2
2 alpha beta 0.513636 0.019905 0.418037 0.188359 0.393665 0.481321 0.196138 0.507833 0.042835 0.022343 0.156208 0.191423 0.258029 0.351852 0.354429 0.301106 col0
3 alpha beta 0.042417 0.472806 0.020204 0.411681 0.019293 0.140623 0.108013 0.136260 0.490850 0.004276 0.212354 0.470580 0.107478 0.011461 0.366930 0.100539 none
4 alpha beta 0.287596 0.317487 0.520418 0.048656 0.033835 0.519975 0.237952 0.449427 0.211287 0.204319 0.465478 0.343144 0.408842 0.050700 0.508506 0.169613 col2
5 alpha beta 0.154299 0.369279 0.188567 0.083248 0.004332 0.436839 0.329708 0.458652 0.329425 0.068757 0.238121 0.102302 0.497296 0.219990 0.041560 0.455992 none
6 alpha beta 0.361315 0.010401 0.039190 0.013335 0.236268 0.268206 0.407632 0.149902 0.041239 0.481870 0.338411 0.430095 0.335075 0.405940 0.108008 0.393766 none
7 alpha beta 0.087221 0.146438 0.202438 0.132151 0.126972 0.442878 0.011017 0.471010 0.229542 0.139989 0.476444 0.294640 0.057385 0.369218 0.433006 0.388687 none
8 alpha beta 0.519944 0.083779 0.520042 0.382255 0.315839 0.083350 0.038597 0.114521 0.160149 0.424750 0.192139 0.457243 0.054164 0.252922 0.133647 0.240518 col2
9 alpha beta 0.246072 0.172740 0.002171 0.046882 0.132817 0.365952 0.252149 0.516113 0.050739 0.054263 0.331012 0.493549 0.433013 0.484111 0.118902 0.475689 col7
Setup MRE
np.random.seed(2022)
test_df = pd.DataFrame(np.random.random((10, 16))) / 1.9
test_df.columns = 'col' test_df.columns.astype(str)
tmp_df = pd.DataFrame({'A': 'alpha', 'B': 'beta'}, index=test_df.index)
test_df = pd.concat([tmp_df, test_df], axis=1)