bind a label to a given encoding with sklearn LabelEncoder-CodePudding

from sklearn.preprocessing import LabelEncoder
l_labels = ['[PAD]']   ['NN', 'ADJ', 'PRON'] 
le = LabelEncoder()
le.fit(l_labels)
le.trasform('[PAD]')

>>>> 3

I want the encodind of '[PAD]' to be 0. Is it possible to bind a label to an encoding with LabelEncoder ?

CodePudding user response：

the scikit learn LabelEncoder is sorting the list of element before the transformation one way to encode 'PAD' to be 0 is the change the name of PAD to some thing that will be sorted as first.

l_labels = ['0'   'PAD']   ['NN', 'ADJ', 'PRON'] 
le = LabelEncoder()
le.fit(l_labels)
le.transform(['0' 'PAD'])
>> [0]

CodePudding user response：

No, you cannot do that in LabelEncoder because it first finds the unique elements and then sorts them to assign numerical encoding.

what happens internally in the `fit` method.

uniques_set = set(values)
uniques_set, missing_values = _extract_missing(uniques_set)

uniques = sorted(uniques_set)

Ref: https://github.com/scikit-learn/scikit-learn/blob/0d378913be6d7e485b792ea36e9268be31ed52d0/sklearn/utils/_encode.py#L135

what happens internally in the fit method.

what happens internally in the `fit` method.