Home > Back-end >  Select rows in pandas where any of six column are not all zero
Select rows in pandas where any of six column are not all zero

Time:10-20

Here's what the pandas table look like:

enter image description here

As you can see the red marked rows have any of all six column values set to '0'. I want to select only non-red rows and filter these red ones out.

I can't seem to figure if there's in-built or, easy way to do it.

CodePudding user response:

Use a boolean mask as suggested by @Ch3steR and use .iloc or .loc to select a subset of columns:

# Minimal sample
>>> df
   A  B  C  D  E  F  G  H  I  J
0  4  0  0  0  0  0  0  1  3  2  # Drop
1  4  6  4  0  0  0  0  0  0  0  # Keep

# .iloc version: select the first 7 columns
>>> df[df.iloc[:, :7].eq(0).sum(1).lt(6)]
   A  B  C  D  E  F  G  H  I  J
1  4  6  4  0  0  0  0  0  0  0

# .loc version: select columns from A to G
>>> df[df.loc[:, 'A':'G'].eq(0).sum(1).lt(6)]
   A  B  C  D  E  F  G  H  I  J
1  4  6  4  0  0  0  0  0  0  0

Step by Step:

# Is value equal to 0
>>> df.loc[:, 'A':'G'].eq(0)
       A      B      C     D     E     F     G
0  False   True   True  True  True  True  True
1  False  False  False  True  True  True  True


# Sum of boolean, if there are 3 True, the sum will be 3
# sum(1) <- 1 is for axis, the sum per row
>>> df.loc[:, 'A':'G'].eq(0).sum(1)
0    6  # 6 zeros
1    4  # 4 zeros
dtype: int64


# Are there less than 6 zeros ?
>>> df.loc[:, 'A':'G'].eq(0).sum(1).lt(6)
0    False
1     True
dtype: bool


# If yes, I keep row else I drop it
>>> df[df.loc[:, 'A':'G'].eq(0).sum(1).lt(6)]
   A  B  C  D  E  F  G  H  I  J
1  4  6  4  0  0  0  0  0  0  0
  • Related