Home > Software engineering >  Pandas - How to identify `nan` values in a Series
Pandas - How to identify `nan` values in a Series

Time:10-03

I am currently playing with Kaggle Titanic dataset (train.csv)

  1. I can load the data fine.
  2. I understood that some data in Embarked column has nan value. But when I tried to filter it using the following code, I am getting an empty array
    import pandas as pd
    df = df.read_csv(<file_loc>, header=0)
    df[df.Embarked == 'nan']

I tried to import numpy.nan to replace the string nan above. But it doesn't work.

What am I trying to find - is all the cells which are not 'S', 'C', 'Q'.

Also realised later that.... the nan is a Float type using type(df.Embarked.unique()[-1]). Could someone help me understand how to identify those nan cells?

CodePudding user response:

NaN is used to represent missing values.

  • To find them, use .isna()

    Detect missing values.

  • To replace them, use .fillna(value)

    Fill NA/NaN values

Some examples on a series called col:

>>> col
0    1.0
1    NaN
2    2.0
dtype: float64
>>> col[col.isna()]
1   NaN
dtype: float64
>>> col.index[col.isna()]
Int64Index([1], dtype='int64')
>>> col.fillna(-1)
0    1.0
1   -1.0
2    2.0
dtype: float64
  • Related