Home > Software engineering >  KNN Imputer Implemenation sklearn
KNN Imputer Implemenation sklearn

Time:06-04

I want to use the class sklearn.impute.KNNImputer to impute missing values in my dataset.

I have 2 questions regarding this:

  1. I have seen multiple implementations on Medium and also the example on the official Sklearn website. None of them normalize the data. Shouldn’t one normalize the data before using KNN? Or does the KNNImputer normalize the data behind the scenes?

  2. The KNNImputer only accepts numerical input. So for categorical data, should I one-hot encode them and then use the Impute function?

Thank you

CodePudding user response:

  1. No, there is no implicit normalisation in the KNNImputer. You can see in the source that it is just using KNN logic to compute weighted average of the features of its neighbours.

  2. Correct, you need to one hot encode them, and then you will need to argmax over these, as the imputer will create not one-hot representations (e.g. [0.2, 0.1, 0.4])

  • Related