For a classification problem I want to replace different country codes by numbers representing the class. The column below is a subset of the complete column acq_cc values representing more than 70 unique countries. I want to replace the 2 letter country code by a number
acq_cc = {KY, US, CN, SG, US, SG, US, US, CZ, CN}
The above subset should be converted into the column below so I could use it as an predictor.
acq_cc = {1, 2, 3, 4, 2, 4, 2, 2, 5, 3}
CodePudding user response:
To do this you can use the pandas built-in:
acq_cc = pd.factorize(var, sort=True)[0]
where in var
you can directly pass the pandas column or your acq_cc