Creating a dictionary of a cross-column label encoder-CodePudding

I have used sklearn's LabelEncoder to generate unique encoding of combination of two columns:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv("data.csv", sep=",")
df
#    A    B    
# 0  1  Yes 
# 1  2   No 
# 2  3  Yes 
# 3  4  Yes

as following:

df['AB'] = df.apply(lambda row: hash((row['A'], row['B'])), axis=1)
le = LabelEncoder()
df['C'] = le.fit_transform(df['AB'])

    A   B   C
0   1   Yes 1
1   2   No  6
2   3   Yes 3
3   4   Yes 4

How can I generate a dictionary of keys and values for the (original columns and the classes) and the labelencoder classes? I can do that for Hashes in AB as:

values=le.transform(le.classes_)
keys=le.classes_
dic=dict(zip(keys,values))

What I am missing here is the keys for the hash function of column AB to produce something like this:

{(1, Yes): 0, (2, No): 6 ,...}

CodePudding user response：

One option is to set the index by A and B, then call to_dict:

out = df.set_index(['A','B'])['C'].to_dict()

Output:

{(1, 'Yes'): 3, (2, 'No'): 1, (3, 'Yes'): 0, (4, 'Yes'): 2}