If I have a pandas dataframe that I want to subset based on if at least two out of three columns are > specific values, what would be the best way to do so?
CodePudding user response:
df = pd.DataFrame({'a': [1, 2, 3, 4, 5],
'b': [50, 0, 0, 30, 10],
'c': [0, 1000, 2000, 0, 0]})
a b c
0 1 50 0
1 2 0 1000
2 3 0 2000
3 4 30 0
4 5 10 0
df[ (df[['a', 'b', 'c']] > [1, 10, 100]).sum(axis=1) >= 2 ]
a b c
1 2 0 1000
2 3 0 2000
3 4 30 0
CodePudding user response:
This will exclude any columns you don't wish to includes, as well as turn all desired values into a 'null' that will be be changed into a 1 to be summarized by row. So with this you would simply have to specify what number you would want to have a certain threshold and any columns you would like excluded from your search.
df = pd.DataFrame({
'Column1' : ['A', 'B', 'C', 'D'],
'Column2' : [1, 1, 1, 1],
'Column3' : [1, 2, 2, 1],
'Column4' : [1, 1, 2, 2]
})
df.loc[np.where(df.mask(df.drop(columns = ['Column1']).gt(1)).isna(), 0, 1).sum(axis = 1) >= 2]