For a dataframe like this:
index col1 col2 col3 col4 col5
1 NaN 0.29 -0.03 1.45 45.67
2 2.45 0.8 NaN 0.11 NaN
3 1.68 9.7 1.18 -145 NaN
4 NaN NaN -9 -0.9 4.8
I would like to get a new df with the rows that match a value of 1 or above (>= 1) in at least 2 columns (two or more columns).
Expected result:
index col1 col2 col3 col4 col5
1 NaN 0.29 -0.03 1.45 45.67
3 1.68 9.7 1.18 -145 NaN
Many thanks
CodePudding user response:
df.loc[df.ge(1).sum(axis=1).ge(2)]
col1 col2 col3 col4 col5
index
1 NaN 0.29 -0.03 1.45 45.67
3 1.68 9.70 1.18 -145.00 NaN
Explanation:
- get the boolean for the entire dataframe, for values >=1:
df.ge(1)
col1 col2 col3 col4 col5
index
1 False False False True True
2 True False False False False
3 True True True False False
4 False False False False True
- Sum the booleans across the columns:
df.ge(1).sum(axis=1)
index
1 2
2 1
3 3
4 1
dtype: int64
Since OP is interested in two columns, the sum above should be >=2:
df.ge(1).sum(axis=1).ge(2)
index
1 True
2 False
3 True
4 False
dtype: bool
All that's left is to index the original df with the boolean result, to get the final output.