How can I binarize a dataset according to the index? E.g.
A B C
idUser
3 1 1 1
2 0 1 0
4 1 0 0
I have tried using pd.get_dummies
but the result is almost what I need.
dictio = {'idUser': [3, 3, 3, 2, 4], 'artist': ['A', 'B', 'C', 'B', 'A']}
df = pd.DataFrame(dictio)
df = df.set_index('idUser')
df_binary = pd.get_dummies(df, columns=['artist'])
print(df_binary)
A B C
idUser
3 1 0 0
3 0 1 0
3 0 0 1
2 0 1 0
4 1 0 0
CodePudding user response:
In [27]: df_binary.groupby(level=0).any().astype(int)
Out[27]:
artist_A artist_B artist_C
idUser
2 0 1 0
3 1 1 1
4 1 0 0
alternatively starting from your df
before the .set_index()
In [33]: df.pivot_table(index='idUser', columns='artist', aggfunc='size', fill_value=0).rename_axis(columns=None)
Out[33]:
A B C
idUser
2 0 1 0
3 1 1 1
4 1 0 0