Home > Blockchain >  Why not able to select data in python pandas dataframe based on multiple or criteria
Why not able to select data in python pandas dataframe based on multiple or criteria

Time:06-28

I have a df with multiple columns and trying to select a subset of the data based on an OR logic:

df [ (df['col1']==0) | (df['col2']==0) | (df['col3']==0) | (df['col4']==0) |
(df['col5']==0) | (df['col6']==0) | (df['col7']==0) | (df['col8']==0) |
(df['col9']==0) | (df['col10']==0) | (df['col11']==0) ]

When I apply this logic the result is empty but I know some of the values are zero

All the values of the these column are int64.

I noticed that 'col11' are all 1's. When I remove 'col11' or swap the order of the query (e.g., putting "| (df['col11']==0)" in the middle )I get the expected results.

I wonder if anyone has had this problem or any ideas what's the reason I'm returning an empty df.

CodePudding user response:

Use (df==0).any(axis=1)

df...

    a   b   c   d   e   f
0   6   8   7  19   3  14
1  14  19   3  13  10  10
2   6  18  16   0  15  12
3  19   4  14   3   8   3
4   4  14  15   1   6  11

>>> (df==0).any(axis=1)
0    False
1    False
2     True
3    False
4    False
>>> #subset of the columns
>>> (df[['a','c','e']]==0).any(axis=1)
0    False
1    False
2    False
3    False
4    False
dtype: bool

If the DataFrame is all integers you can make use of the fact that zero is falsey and use

~df.all(axis=1)

To make fake data

import numpy as np
import pandas as pd
rng = np.random.default_rng()
nrows = 5
df = pd.DataFrame(rng.integers(0,20,(nrows,6)),columns=['a', 'b', 'c', 'd','e','f'])
  • Related