I am asked to code the following 2 columns, and you have these values, when using the method cat.codes the problem arises that the 2 columns are not with the same codes, what I want is that the data that are equal are with the same code?
Example:
The input is a dataframe
col1 col2
0 A E
1 B F
2 C A
3 D B
4 A B
5 E A
CodePudding user response:
You can try below as well
a b
0 apple nokia
1 xiomi samsung
2 samsung apple
3 moto oneplus
import pandas as pd
from sklearn import preprocessing
cat_var = list(df.a.values) list(df.b.values)
le = preprocessing.LabelEncoder()
le.fit(cat_var)
df['a'] = le.transform(df.a)
df['b'] = le.transform(df.b)
will give you below output
a b
0 0 2
1 5 4
2 4 0
3 1 3
CodePudding user response:
Assuming this input as df
:
col1 col2
0 A E
1 B F
2 C A
3 D B
4 A B
5 E A
You can compute the unique values and use them to factorize:
vals = df[['col1', 'col2']].stack().unique()
d = {k:v for v,k in enumerate(vals)}
df['col1_codes'] = df['col1'].map(d)
df['col2_codes'] = df['col2'].map(d)
output:
col1 col2 col1_codes col2_codes
0 A E 0 1
1 B F 2 3
2 C A 4 0
3 D B 5 2
4 A B 0 2
5 E A 1 0