Home > Blockchain >  Convert Categorical features to Numerical
Convert Categorical features to Numerical

Time:11-11

I have a lot of categorical columns and want to convert values in those columns to numerical values so that I will be able to apply ML model.

Now by data looks something like below.

Column 1- Good/bad/poor/not reported column 2- Red/amber/green column 3- 1/2/3 column 4- Yes/No

Now I have already assigned numerical values of 1,2,3,4 to good, bad, poor, not reported in column 1 .

So, now can I give the same numerical values like 1,2,3 to red,green, amber etc in column 2 and in a similar fashion to other columns or will doing that confuse model when I implement it

CodePudding user response:

The colour values you mention are nominal. There is no ranking or order to these values. If you assign 1,2,3 etc the data can be misrepresented as being from a scale.

To avoid this you can transform them by using the onehotencoder technique. This effectively encodes a multi value categorical field into the following:

red = 100
amber = 010
green = 001

You can use the following library from sk-learn.preprocessing: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

  • Related