I have a df with many columns and I want to select all rows that have any 2 column-values nonzero. filtering on rows where specific columns are nonzero is trivial, I mean this is probably trivial as well but it's eluding me. have
/want
dataframes for reproduceable example below:
import pandas as pd
have = pd.DataFrame({'ID': [1,2,3,4,5],
'grp1': [1,0,0,0,0],
'grp2': [0,2,0,0,0],
'grp3': [1,1,0,0,5]})
want = have.iloc[0:2]
>>> want
ID grp1 grp2 grp3
0 1 1 0 1
1 2 0 2 1
last note - I pass back and forth between pandas and sql(ite) a lot, so if there's a more elegant way to do this in sql than pandas that I'm missing, I'm open to that too.
CodePudding user response:
In pandas:
have[have.iloc[:,1:].gt(0).sum(1).gt(1)]
ID grp1 grp2 grp3
0 1 1 0 1
1 2 0 2 1
CodePudding user response:
IIUC, you need:
have.loc[(have.iloc[:,1:]!=0).sum(1)>1]
as you said non-zero values.
CodePudding user response:
You can convert the column type to bool then zero will be Fasle
and non zero will be True
and sum the True
to get non zero count.
out = have[have.filter(like='grp').astype(bool).sum(axis=1).ge(2)]
print(out)
ID grp1 grp2 grp3
0 1 1 0 1
1 2 0 2 1