I have a df with multiple columns and trying to select a subset of the data based on an OR logic:
df [ (df['col1']==0) | (df['col2']==0) | (df['col3']==0) | (df['col4']==0) |
(df['col5']==0) | (df['col6']==0) | (df['col7']==0) | (df['col8']==0) |
(df['col9']==0) | (df['col10']==0) | (df['col11']==0) ]
When I apply this logic the result is empty but I know some of the values are zero
All the values of the these column are int64.
I noticed that 'col11' are all 1's. When I remove 'col11' or swap the order of the query (e.g., putting "| (df['col11']==0)" in the middle )I get the expected results.
I wonder if anyone has had this problem or any ideas what's the reason I'm returning an empty df.
CodePudding user response:
Use (df==0).any(axis=1)
df...
a b c d e f
0 6 8 7 19 3 14
1 14 19 3 13 10 10
2 6 18 16 0 15 12
3 19 4 14 3 8 3
4 4 14 15 1 6 11
>>> (df==0).any(axis=1)
0 False
1 False
2 True
3 False
4 False
>>> #subset of the columns
>>> (df[['a','c','e']]==0).any(axis=1)
0 False
1 False
2 False
3 False
4 False
dtype: bool
If the DataFrame is all integers you can make use of the fact that zero is falsey and use
~df.all(axis=1)
To make fake data
import numpy as np
import pandas as pd
rng = np.random.default_rng()
nrows = 5
df = pd.DataFrame(rng.integers(0,20,(nrows,6)),columns=['a', 'b', 'c', 'd','e','f'])