What is the easiest way to convert a binned feature to a numeric categorical feature?-CodePudding

Converting a numeric feature into a categorical binned feature is pretty simple when using pandas.cut(). However, say you want to do the opposite by converting a binned object feature into a numeric categorical feature (1, 2, 3, 4... etc.), what would be the easiest way to do so?

Distinct binned categories: ["0-9%", "10-19%", "20-29%", "30-39%", "40-49%", "50-59%", etc...]

There are many methods naïve methods that springs to mind to solve this problem. E.g, running a for-loop with if-statements:

temp = []  
for i in list1:
    if i == "0-9%":
        temp.append(1)
    elif i == "10-19%":
        temp.append(2)
    elif i == "20-29%":
        temp.append(3)
etc......

Or by creating a dictionary with each distinct binned category as keys and using their index values as values:

temp = {}
for v, k in enumerate(pd.unique(list1)):
    temp[k] = v 1          #  1 just to skip first value 0

list1 = [temp[bin] for bin in list1]

These two methods feel, however, a bit naïve and I'm curious to whether there are simpler solutions to this issue?

CodePudding user response：

There is already a numerical information in a Categorical.

Use cat.codes to access it:

df = pd.DataFrame({'val': range(1,40,7)})
bins = [0,10,20,30,40]
labels = ["0-9%", "10-19%", "20-29%", "30-39%"]

df['cat'] = pd.cut(df['val'], bins=bins, labels=labels)

df['code'] = df['cat'].cat.codes.add(1)

print(df)

Output:

   val     cat  code
0    1    0-9%     1
1    8    0-9%     1
2   15  10-19%     2
3   22  20-29%     3
4   29  20-29%     3
5   36  30-39%     4

If the input is not a Categorical, you need to use factorize.

CodePudding user response：

Create a dictionary showing the current bin and the number you want to convert it to and then you the replace function

conversion={"0-9%":1, "10-19%":2, "20-29%":2,.....etc }
df.replace(conversion)