NaNs not recognized in df.loc or for loops-CodePudding

I currently have a df with a column Outliers. When I do:

df.Outliers.value_counts(dropna = False)

I get:

NaN    2862
1.0     600
0.0     257

However, when I try to display only these rows with:

df.loc[df.Outliers == np.nan] # numpy was imported as np

I get an output of 0 rows. Why are the NaN rows not being recognized as NaN? I have verified that these NaN values are of the type numpy.float64, so they aren't strings that need to be converted. Why are they not recognized as NaNs sometimes?

CodePudding user response：

Pandas needs help sometimes when working with np.nan as it isn't always recognized correctly. However, you can use a isna() to find all columns/rows where there is data that includes a nan

df = pd.DataFrame({
    'Column1' : [np.nan, 2, 3, 4],
    'Column2' : [1, np.nan, 3, np.nan]
})
df.loc[df['Column1'].isna()]