Home > Enterprise >  Look for a value in at least 2 columns in all df rows
Look for a value in at least 2 columns in all df rows

Time:12-07

For a dataframe like this:

index   col1    col2    col3    col4    col5
1       NaN     0.29    -0.03   1.45    45.67
2       2.45    0.8     NaN     0.11    NaN
3       1.68    9.7     1.18    -145    NaN
4       NaN     NaN     -9      -0.9    4.8

I would like to get a new df with the rows that match a value of 1 or above (>= 1) in at least 2 columns (two or more columns).

Expected result:

index   col1    col2    col3    col4    col5
1       NaN     0.29    -0.03   1.45    45.67
3       1.68    9.7     1.18    -145    NaN

Many thanks

CodePudding user response:

df.loc[df.ge(1).sum(axis=1).ge(2)]

       col1  col2  col3    col4   col5
index
1       NaN  0.29 -0.03    1.45  45.67
3      1.68  9.70  1.18 -145.00    NaN

Explanation:

  • get the boolean for the entire dataframe, for values >=1:
df.ge(1)

        col1   col2   col3   col4   col5
index
1      False  False  False   True   True
2       True  False  False  False  False
3       True   True   True  False  False
4      False  False  False  False   True
  • Sum the booleans across the columns:
df.ge(1).sum(axis=1)
index
1    2
2    1
3    3
4    1
dtype: int64

Since OP is interested in two columns, the sum above should be >=2:

df.ge(1).sum(axis=1).ge(2)

index
1     True
2    False
3     True
4    False
dtype: bool

All that's left is to index the original df with the boolean result, to get the final output.

  • Related