Home > OS >  pandas one-hot-encoding column containing a list of feature and each feature can be negative
pandas one-hot-encoding column containing a list of feature and each feature can be negative

Time:05-19

So i have the following dataset

d = {'user': ['a','a','b','b'], 'item':[1, 2, 1, 3], 'features': [[2], [-2, -1], [-137, -1, 2], [-137, 2, 1]]}
df = pd.DataFrame(data=d)

        user       item     features
0     a            1        [2]
1     a            2        [-2, -1]
2     b            1        [-137, -1, 2]
3     b            3        [-137, 2, 1]

i'm trying to obtain the following dataset:

        user       item     '2'    '1'    '137'
0     a            1        1      0      0
1     a            2        -1     -1     0
2     b            1        1      -1     -1
3     b            3        1      1      -1

i tried to use:

dataset = load_dataset()
mlb = MultiLabelBinarizer()
dataset = dataset.join(pd.DataFrame(mlb.fit_transform(dataset.pop('features')),
                          columns=mlb.classes_,
                          index=dataset.index))

but i obtained this:

        user       item     '-1' '-137'  '-2' '1' '2'
0     a            1        0    0       0    0   1
1     a            2        1    0       1    0   0
2     b            1        1    1       0    0   1
3     b            3        0    1       0    1   1

Can someone please help me ?

CodePudding user response:

In pandas this can be done as follows:

df1 = df.explode('features')
df1['f1'] = abs(df1.features)
df1['f2'] = np.sign(df1.features)
df1.pivot(['user', 'item'], 'f1', 'f2').fillna(0).reset_index()

f2 user  item  1  2  137
0     a     1  0  1    0
1     a     2 -1 -1    0
2     b     1 -1  1   -1
3     b     3  1  1   -1
  • Related