Home > Net >  coding 2 columns in pandas with the same key
coding 2 columns in pandas with the same key

Time:12-05

I am asked to code the following 2 columns, and you have these values, when using the method cat.codes the problem arises that the 2 columns are not with the same codes, what I want is that the data that are equal are with the same code?

Example:

The input is a dataframe

  col1 col2
0    A    E
1    B    F
2    C    A
3    D    B
4    A    B
5    E    A

CodePudding user response:

You can try below as well

    a       b
0   apple   nokia
1   xiomi   samsung
2   samsung apple
3   moto    oneplus

import pandas as pd
from sklearn import preprocessing 

cat_var = list(df.a.values) list(df.b.values)

le = preprocessing.LabelEncoder()
                                 
le.fit(cat_var)

df['a'] = le.transform(df.a)
df['b'] = le.transform(df.b)

will give you below output


    a   b
0   0   2
1   5   4
2   4   0
3   1   3

CodePudding user response:

Assuming this input as df:

  col1 col2
0    A    E
1    B    F
2    C    A
3    D    B
4    A    B
5    E    A

You can compute the unique values and use them to factorize:

vals = df[['col1', 'col2']].stack().unique()
d = {k:v for v,k in enumerate(vals)}
df['col1_codes'] = df['col1'].map(d)
df['col2_codes'] = df['col2'].map(d)

output:

  col1 col2  col1_codes  col2_codes
0    A    E           0           1
1    B    F           2           3
2    C    A           4           0
3    D    B           5           2
4    A    B           0           2
5    E    A           1           0
  • Related