I am using this data set: Titanic pasengers
I am trying to fill in missing categorical data but the fillna()
with the inplace
option does not do anything:
import pandas as pd
data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')
# replace question marks with np.nan
data = data.replace('?', np.nan)
var_categor = ['sex', 'cabin', 'embarked' ]
data.loc[:, var_categor].fillna("Missing", inplace=True)
I get the same number of nan values:
data[var_categor].isnull().sum()
I get no error messages, no warnings, it just doesnt do anything. Is this normal behavior? Shouldn;t it give a warning?
CodePudding user response:
Try to chain operations and return a copy of values rather than modify inplace
:
data[var_categor] = data.replace('?', np.nan)[var_categor].fillna('Missing')
>>> data[var_categor].isna().sum()
sex 0
cabin 0
embarked 0
dtype: int64
CodePudding user response:
It’s likely an issue with getting a view/slice/copy of the dataframe, and setting things in-place on that object.
The trivial fix is to not use inplace
of course:
data[var_categor] = data[var_categor].fillna("Missing")
An alternate way is to use .fillna
directly on the object. Here if you want to limit which columns are filled, a dictionary mapping columns to replacement values can be used:
>>> data.fillna({var: 'Missing' for var in var_categor}, inplace=True)
>>> data[var_categor].isna().sum()
sex 0
cabin 0
embarked 0
dtype: int64
However best practice in pandas is to avoid inplace
, see the github issue that discusses deprecating it for more detail.