What is the difference between sklearn SimpleImputer and pandas.fillna?
i tried using both the functions from their respective libraries but didn't notice any apparent difference.
CodePudding user response:
These are kind of similar implementations in different libraries. Having said that SimpleImputer using mean , meadian, most frequent and constant strategies to fill up the NAs where as fillna uses backfil, bfil, pad,ffil,and None .
for reference [ SimpleImputer ] [ pandas fillna ]
If you are using a bigger dataset , it is make sense to use SimpleImputer because of slight performance improvements, but you need to fit and transform the dataset means you may need to have more codes rather than fillna(). if you are not in need for pandas , then it will make sense to use SimpleImputer
CodePudding user response:
Simple imputer provides more flexibility, it allows you to for example choose the missing value, so instead of filling the classic np.nan
you could use it to fill all occurrences of the value 92942429 (for example) in one step.
Simple imputer can also be placed in sklearn pipelines, and even for more granular control in the column transformer, it just saves time in the end as once you've constructed the pipeline which includes the imputer and your chosen estimator you simply pass in your dataframe and the imputation is take care of when the fit method on the pipeline object is invoked.
But at the end of the day, they do very similar things.