Home > other >  How to replace nan in a column with the median of the column
How to replace nan in a column with the median of the column

Time:07-29

Using Pandas, I've been working on Kaggle's titanic problem, and have tried different variants of the groupby/ apply to try to fill out the NaN entries of the training data, train['Age'] Column.

import pandas as pd
import numpy as np

train = pd.DataFrame({'ID': [887, 888, 889, 890], 'Age': [19.0, np.nan, 26.0, 32.0]})

    ID   Age
0  887  19.0
1  888   NaN
2  889  26.0
3  890  32.0

how would I go through the elements and change these NaN elements to something like the median age?

I've tried variations of

train.Age = train.Age.apply(lambda x: x.fillna(x.median()))
  • Which results in
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [249], in <cell line: 1>()
----> 1 train.Age = train.Age.apply(lambda x: x.fillna(x.median()))

File ~\anaconda3\envs\py10\lib\site-packages\pandas\core\series.py:4433, in Series.apply(self, func, convert_dtype, args, **kwargs)
   4323 def apply(
   4324     self,
   4325     func: AggFuncType,
   (...)
   4328     **kwargs,
   4329 ) -> DataFrame | Series:
   4330     """
   4331     Invoke function on values of Series.
   4332 
   (...)
   4431     dtype: float64
   4432     """
-> 4433     return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

File ~\anaconda3\envs\py10\lib\site-packages\pandas\core\apply.py:1088, in SeriesApply.apply(self)
   1084 if isinstance(self.f, str):
   1085     # if we are a string, try to dispatch
   1086     return self.apply_str()
-> 1088 return self.apply_standard()

File ~\anaconda3\envs\py10\lib\site-packages\pandas\core\apply.py:1143, in SeriesApply.apply_standard(self)
   1137         values = obj.astype(object)._values
   1138         # error: Argument 2 to "map_infer" has incompatible type
   1139         # "Union[Callable[..., Any], str, List[Union[Callable[..., Any], str]],
   1140         # Dict[Hashable, Union[Union[Callable[..., Any], str],
   1141         # List[Union[Callable[..., Any], str]]]]]"; expected
   1142         # "Callable[[Any], Any]"
-> 1143         mapped = lib.map_infer(
   1144             values,
   1145             f,  # type: ignore[arg-type]
   1146             convert=self.convert_dtype,
   1147         )
   1149 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1150     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1151     #  See also GH#25959 regarding EA support
   1152     return obj._constructor_expanddim(list(mapped), index=obj.index)

File ~\anaconda3\envs\py10\lib\site-packages\pandas\_libs\lib.pyx:2870, in pandas._libs.lib.map_infer()

Input In [249], in <lambda>(x)
----> 1 train.Age = train.Age.apply(lambda x: x.fillna(x.median()))

AttributeError: 'float' object has no attribute 'fillna'

Could someone lead me in the right direction? I don't even need the code; just some tips/hints. I've been reading through the pandas documentation without any progress. Can it be done with just apply? or some kind of groupby method?

CodePudding user response:

You may check with fillna without apply

train.Age = train.Age.fillna(train.Age.median())
train
Out[561]: 
     D   Age
0  887  19.0
1  888  26.0
2  889  26.0
3  890  32.0

CodePudding user response:

The above code can only be used when there is NaN or NA values in a specific column. To used it for changing values based on a condition on the values on a row element of a column you can use loc :

train.loc[train['Age'].isna(),'Age'] = train['Age'].median()
  • Related