Home > Blockchain >  pandas df filter on rows with any n rows satisfying condition
pandas df filter on rows with any n rows satisfying condition

Time:04-26

I have a df with many columns and I want to select all rows that have any 2 column-values nonzero. filtering on rows where specific columns are nonzero is trivial, I mean this is probably trivial as well but it's eluding me. have/want dataframes for reproduceable example below:

import pandas as pd

have = pd.DataFrame({'ID': [1,2,3,4,5],
        'grp1': [1,0,0,0,0],
        'grp2': [0,2,0,0,0],
        'grp3': [1,1,0,0,5]})

want = have.iloc[0:2]
>>> want
   ID  grp1  grp2  grp3
0   1     1     0     1
1   2     0     2     1

last note - I pass back and forth between pandas and sql(ite) a lot, so if there's a more elegant way to do this in sql than pandas that I'm missing, I'm open to that too.

CodePudding user response:

In pandas:

have[have.iloc[:,1:].gt(0).sum(1).gt(1)]
 
   ID  grp1  grp2  grp3
0   1     1     0     1
1   2     0     2     1

CodePudding user response:

IIUC, you need:

have.loc[(have.iloc[:,1:]!=0).sum(1)>1] as you said non-zero values.

CodePudding user response:

You can convert the column type to bool then zero will be Fasle and non zero will be True and sum the True to get non zero count.

out = have[have.filter(like='grp').astype(bool).sum(axis=1).ge(2)]
print(out)

   ID  grp1  grp2  grp3
0   1     1     0     1
1   2     0     2     1
  • Related