How to replace nan in a column with the median of the column-CodePudding

Using Pandas, I've been working on Kaggle's titanic problem, and have tried different variants of the groupby/ apply to try to fill out the NaN entries of the training data, train['Age'] Column.

import pandas as pd
import numpy as np

train = pd.DataFrame({'ID': [887, 888, 889, 890], 'Age': [19.0, np.nan, 26.0, 32.0]})

    ID   Age
0  887  19.0
1  888   NaN
2  889  26.0
3  890  32.0

how would I go through the elements and change these NaN elements to something like the median age?

I've tried variations of

train.Age = train.Age.apply(lambda x: x.fillna(x.median()))

Which results in

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [249], in <cell line: 1>()
----> 1 train.Age = train.Age.apply(lambda x: x.fillna(x.median()))

File ~\anaconda3\envs\py10\lib\site-packages\pandas\core\series.py:4433, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4323 def apply(
   4324     self,
   4325     func: AggFuncType,
   (...)
   4328     **kwargs,
   4329 ) -> DataFrame | Series:
   4330     """
   4331     Invoke function on values of Series.
   4332 
   (...)
   4431     dtype: float64
   4432     """
-> 4433     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~\anaconda3\envs\py10\lib\site-packages\pandas\core\apply.py:1088, in SeriesApply.apply(self)
   1084 if isinstance(self.f, str):
   1085     # if we are a string, try to dispatch
   1086     return self.apply_str()
-> 1088 return self.apply_standard()

File ~\anaconda3\envs\py10\lib\site-packages\pandas\core\apply.py:1143, in SeriesApply.apply_standard(self)
   1137         values = obj.astype(object)._values
   1138         # error: Argument 2 to "map_infer" has incompatible type
   1139         # "Union[Callable[..., Any], str, List[Union[Callable[..., Any], str]],
   1140         # Dict[Hashable, Union[Union[Callable[..., Any], str],
   1141         # List[Union[Callable[..., Any], str]]]]]"; expected
   1142         # "Callable[[Any], Any]"
-> 1143         mapped = lib.map_infer(
   1144             values,
   1145             f,  # type: ignore[arg-type]
   1146             convert=self.convert_dtype,
   1147         )
   1149 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1150     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1151     #  See also GH#25959 regarding EA support
   1152     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~\anaconda3\envs\py10\lib\site-packages\pandas\_libs\lib.pyx:2870, in pandas._libs.lib.map_infer()

Input In [249], in <lambda>(x)
----> 1 train.Age = train.Age.apply(lambda x: x.fillna(x.median()))

AttributeError: 'float' object has no attribute 'fillna'

Could someone lead me in the right direction? I don't even need the code; just some tips/hints. I've been reading through the pandas documentation without any progress. Can it be done with just apply? or some kind of groupby method?

CodePudding user response：

You may check with fillna without apply

train.Age = train.Age.fillna(train.Age.median())
train
Out[561]: 
     D   Age
0  887  19.0
1  888  26.0
2  889  26.0
3  890  32.0

CodePudding user response：

The above code can only be used when there is NaN or NA values in a specific column. To used it for changing values based on a condition on the values on a row element of a column you can use loc :

train.loc[train['Age'].isna(),'Age'] = train['Age'].median()