So i have the following dataset
d = {'user': ['a','a','b','b'], 'item':[1, 2, 1, 3], 'features': [[2], [-2, -1], [-137, -1, 2], [-137, 2, 1]]}
df = pd.DataFrame(data=d)
user item features
0 a 1 [2]
1 a 2 [-2, -1]
2 b 1 [-137, -1, 2]
3 b 3 [-137, 2, 1]
i'm trying to obtain the following dataset:
user item '2' '1' '137'
0 a 1 1 0 0
1 a 2 -1 -1 0
2 b 1 1 -1 -1
3 b 3 1 1 -1
i tried to use:
dataset = load_dataset()
mlb = MultiLabelBinarizer()
dataset = dataset.join(pd.DataFrame(mlb.fit_transform(dataset.pop('features')),
columns=mlb.classes_,
index=dataset.index))
but i obtained this:
user item '-1' '-137' '-2' '1' '2'
0 a 1 0 0 0 0 1
1 a 2 1 0 1 0 0
2 b 1 1 1 0 0 1
3 b 3 0 1 0 1 1
Can someone please help me ?
CodePudding user response:
In pandas this can be done as follows:
df1 = df.explode('features')
df1['f1'] = abs(df1.features)
df1['f2'] = np.sign(df1.features)
df1.pivot(['user', 'item'], 'f1', 'f2').fillna(0).reset_index()
f2 user item 1 2 137
0 a 1 0 1 0
1 a 2 -1 -1 0
2 b 1 -1 1 -1
3 b 3 1 1 -1