Home > Net >  Replace string for numeric labels for classification
Replace string for numeric labels for classification

Time:11-03

For a classification problem I want to replace different country codes by numbers representing the class. The column below is a subset of the complete column acq_cc values representing more than 70 unique countries. I want to replace the 2 letter country code by a number

acq_cc = {KY, US, CN, SG, US, SG, US, US, CZ, CN}

The above subset should be converted into the column below so I could use it as an predictor.

acq_cc = {1, 2, 3, 4, 2, 4, 2, 2, 5, 3}

CodePudding user response:

To do this you can use the pandas built-in:

acq_cc = pd.factorize(var, sort=True)[0]

where in var you can directly pass the pandas column or your acq_cc

  • Related