Home > Blockchain >  Subsetting pandas dataframe where at least two columns are true
Subsetting pandas dataframe where at least two columns are true

Time:08-26

If I have a pandas dataframe that I want to subset based on if at least two out of three columns are > specific values, what would be the best way to do so?

CodePudding user response:

df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 
                   'b': [50, 0, 0, 30, 10], 
                   'c': [0, 1000, 2000, 0, 0]})

   a   b     c
0  1  50     0
1  2   0  1000
2  3   0  2000
3  4  30     0
4  5  10     0
df[ (df[['a', 'b', 'c']] > [1, 10, 100]).sum(axis=1) >= 2 ]

   a   b     c
1  2   0  1000
2  3   0  2000
3  4  30     0

CodePudding user response:

This will exclude any columns you don't wish to includes, as well as turn all desired values into a 'null' that will be be changed into a 1 to be summarized by row. So with this you would simply have to specify what number you would want to have a certain threshold and any columns you would like excluded from your search.

df = pd.DataFrame({
    'Column1' : ['A', 'B', 'C', 'D'],
    'Column2' : [1, 1, 1, 1],
    'Column3' : [1, 2, 2, 1],
    'Column4' : [1, 1, 2, 2]
})

df.loc[np.where(df.mask(df.drop(columns = ['Column1']).gt(1)).isna(), 0, 1).sum(axis = 1) >= 2]
  • Related