Home > Mobile >  Why does pandas fillna() inplace does not work for multiple columns?
Why does pandas fillna() inplace does not work for multiple columns?

Time:10-02

I am using this data set: Titanic pasengers I am trying to fill in missing categorical data but the fillna() with the inplace option does not do anything:

import pandas as pd

data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')

# replace question marks with np.nan
data = data.replace('?', np.nan)

var_categor = ['sex', 'cabin', 'embarked' ] 

data.loc[:, var_categor].fillna("Missing", inplace=True)

I get the same number of nan values:

data[var_categor].isnull().sum()

I get no error messages, no warnings, it just doesnt do anything. Is this normal behavior? Shouldn;t it give a warning?

CodePudding user response:

Try to chain operations and return a copy of values rather than modify inplace:

data[var_categor] = data.replace('?', np.nan)[var_categor].fillna('Missing')
>>> data[var_categor].isna().sum()
sex         0
cabin       0
embarked    0
dtype: int64

CodePudding user response:

It’s likely an issue with getting a view/slice/copy of the dataframe, and setting things in-place on that object.

The trivial fix is to not use inplace of course:

data[var_categor] = data[var_categor].fillna("Missing")

An alternate way is to use .fillna directly on the object. Here if you want to limit which columns are filled, a dictionary mapping columns to replacement values can be used:

>>> data.fillna({var: 'Missing' for var in var_categor}, inplace=True)
>>> data[var_categor].isna().sum()
sex         0
cabin       0
embarked    0
dtype: int64

However best practice in pandas is to avoid inplace, see the github issue that discusses deprecating it for more detail.

  • Related