I am currently playing with Kaggle Titanic dataset (train.csv)
- I can load the data fine.
- I understood that some data in
Embarked
column hasnan
value. But when I tried to filter it using the following code, I am getting an empty array
import pandas as pd
df = df.read_csv(<file_loc>, header=0)
df[df.Embarked == 'nan']
I tried to import numpy.nan
to replace the string nan
above. But it doesn't work.
What am I trying to find - is all the cells which are not 'S', 'C', 'Q'.
Also realised later that.... the nan
is a Float type using type(df.Embarked.unique()[-1])
. Could someone help me understand how to identify those nan
cells?
CodePudding user response:
NaN
is used to represent missing values.
- To find them, use
.isna()
Detect missing values.
- To replace them, use
.fillna(value)
Fill NA/NaN values
Some examples on a series called col
:
>>> col
0 1.0
1 NaN
2 2.0
dtype: float64
>>> col[col.isna()]
1 NaN
dtype: float64
>>> col.index[col.isna()]
Int64Index([1], dtype='int64')
>>> col.fillna(-1)
0 1.0
1 -1.0
2 2.0
dtype: float64