from sklearn.preprocessing import LabelEncoder
l_labels = ['[PAD]'] ['NN', 'ADJ', 'PRON']
le = LabelEncoder()
le.fit(l_labels)
le.trasform('[PAD]')
>>>> 3
I want the encodind of '[PAD]' to be 0. Is it possible to bind a label to an encoding with LabelEncoder ?
CodePudding user response:
the scikit learn LabelEncoder is sorting the list of element before the transformation one way to encode 'PAD' to be 0 is the change the name of PAD to some thing that will be sorted as first.
l_labels = ['0' 'PAD'] ['NN', 'ADJ', 'PRON']
le = LabelEncoder()
le.fit(l_labels)
le.transform(['0' 'PAD'])
>> [0]
CodePudding user response:
No, you cannot do that in LabelEncoder
because it first finds the unique elements and then sorts them to assign numerical encoding.
what happens internally in the fit
method.
uniques_set = set(values)
uniques_set, missing_values = _extract_missing(uniques_set)
uniques = sorted(uniques_set)