Converting a numeric feature into a categorical binned feature is pretty simple when using pandas.cut(). However, say you want to do the opposite by converting a binned object feature into a numeric categorical feature (1, 2, 3, 4... etc.), what would be the easiest way to do so?
Distinct binned categories: ["0-9%", "10-19%", "20-29%", "30-39%", "40-49%", "50-59%", etc...]
There are many methods naïve methods that springs to mind to solve this problem. E.g, running a for-loop with if-statements:
temp = []
for i in list1:
if i == "0-9%":
temp.append(1)
elif i == "10-19%":
temp.append(2)
elif i == "20-29%":
temp.append(3)
etc......
Or by creating a dictionary with each distinct binned category as keys and using their index values as values:
temp = {}
for v, k in enumerate(pd.unique(list1)):
temp[k] = v 1 # 1 just to skip first value 0
list1 = [temp[bin] for bin in list1]
These two methods feel, however, a bit naïve and I'm curious to whether there are simpler solutions to this issue?
CodePudding user response:
There is already a numerical information in a Categorical.
Use cat.codes
to access it:
df = pd.DataFrame({'val': range(1,40,7)})
bins = [0,10,20,30,40]
labels = ["0-9%", "10-19%", "20-29%", "30-39%"]
df['cat'] = pd.cut(df['val'], bins=bins, labels=labels)
df['code'] = df['cat'].cat.codes.add(1)
print(df)
Output:
val cat code
0 1 0-9% 1
1 8 0-9% 1
2 15 10-19% 2
3 22 20-29% 3
4 29 20-29% 3
5 36 30-39% 4
If the input is not a Categorical, you need to use factorize
.
CodePudding user response:
Create a dictionary showing the current bin and the number you want to convert it to and then you the replace function
conversion={"0-9%":1, "10-19%":2, "20-29%":2,.....etc }
df.replace(conversion)