I have 10 unique values in one column from my dataframe. For example below is the dataframe
df['categories'].unique()
output is :
Electronic
Computers
Mobile Phone
Router
Food
I want to replace 'Electronic' with 1, 'Computers' with 2, 'Mobile Phone' with 3, 'Router' with 4 and 'Food' with 5. The expected output must be
df['categories'].unique()
Expected output:
1
2
3
4
5
I tried looping the df['categories'].unique(), but i'm unable to do that. Can anyone help me with this?
CodePudding user response:
This will work:
new_vals = {'Electronic': 1, 'Computers' : 2, 'Mobile Phone' : 3, 'Router' : 4 , 'Food' : 5}
df = df.replace({'categories': new_vals})
CodePudding user response:
you could try this:
df['categories'] = df['categories'].astype('category').cat.codes
CodePudding user response:
scikit-learn
provides similar functionality.
This approach is optimal when you are trying to build a predictive model and the codes do not play a role:
For example, it does not matter to you that: "Computers" category will get a code of '1' or '2' or '5'.
from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder()
df['categories'] = enc.fit_transform(X=df[['categories']]).astype('int')