Assume there is a dataframe such as
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':[1,2,3,4,5],
'col2':[11,12,np.nan,24,np.nan]})
df
col1 col2
0 1 11.0
1 2 12.0
2 3 NaN
3 4 24.0
4 5 NaN
I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks
df1 = df[(df['col1'] < 4 & df['col2'].notna())]
df1
col1 col2
CodePudding user response:
Because of the operator precedence (bitwise operators, e.g. &
, have higher precedence than comparison operators, e.g. <
).
Currently, your mask is being evaluated as
>>> df['col1'] < (4 & df['col2'].notna())
0 False
1 False
2 False
3 False
4 False
dtype: bool
That is why no rows are being selected. You have to wrap the first condition inside parentheses
>>> df1 = df[(df['col1'] < 4) & df['col2'].notna()]
>>> df1
col1 col2
0 1 11.0
1 2 12.0
CodePudding user response:
This is how you can determine if there are np.nan
in your data and how to use additional logic with np.where()
df = pd.DataFrame({
'Column1' : [1, 2, 3, 4, 5],
'Column2' : [11, 12, np.nan, 24, np.nan]
})
df['Column2'] = df['Column2'].replace({np.nan : None})
df['Check'] = np.where((df['Column2'].values == None) & (df['Column1'] > 4), True, False)
df
CodePudding user response:
You left out brackest, thats why. Code below
df[((df['col1'] < 4) & (df['col2'].notna()))]
0r
df.query("(col1<4)&(col2.notna())", engine='python')