Home > Net >  select non-NaN rows with multiple conditions from a pandas dataframe
select non-NaN rows with multiple conditions from a pandas dataframe

Time:06-14

Assume there is a dataframe such as

import pandas as pd 
import numpy as np

df = pd.DataFrame({'col1':[1,2,3,4,5],
                   'col2':[11,12,np.nan,24,np.nan]})

df
    col1    col2
0   1       11.0
1   2       12.0
2   3       NaN
3   4       24.0
4   5       NaN

I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks

df1 = df[(df['col1'] < 4 & df['col2'].notna())]
df1

col1    col2

CodePudding user response:

Because of the operator precedence (bitwise operators, e.g. &, have higher precedence than comparison operators, e.g. <).

Currently, your mask is being evaluated as

>>> df['col1'] < (4 & df['col2'].notna())

0    False
1    False
2    False
3    False
4    False
dtype: bool

That is why no rows are being selected. You have to wrap the first condition inside parentheses

>>> df1 = df[(df['col1'] < 4) & df['col2'].notna()]
>>> df1

   col1  col2
0     1  11.0
1     2  12.0

CodePudding user response:

This is how you can determine if there are np.nan in your data and how to use additional logic with np.where()

df = pd.DataFrame({
    'Column1' : [1, 2, 3, 4, 5],
    'Column2' : [11, 12, np.nan, 24, np.nan]
})
df['Column2'] = df['Column2'].replace({np.nan : None})
df['Check'] = np.where((df['Column2'].values == None) & (df['Column1'] > 4), True, False)
df

CodePudding user response:

You left out brackest, thats why. Code below

df[((df['col1'] < 4) & (df['col2'].notna()))]

0r

df.query("(col1<4)&(col2.notna())", engine='python')
  • Related