I am new to pandas. I am trying to move the items of a column to the columns of dataframe. I am struggling for hours but could not do so.
MWE
import numpy as np
import pandas as pd
df = pd.DataFrame({
'X': [10,20,30,40,50],
'Y': [list('abd'), list(), list('ab'),list('abefc'),list('e')]
})
print(df)
X Y
0 10 [a, b, d]
1 20 []
2 30 [a, b]
3 40 [a, b, e, f, c]
4 50 [e]
How to get the result like this:
X a b c d e
0 10 1 1 0 1 0
1 20 0 0 0 0 0
2 30 1 1 0 0 0
3 40 1 1 1 0 1
4 50 0 0 0 0 1
CodePudding user response:
You can try pandas.Series.str.get_dummies
out = df[['X']].join(df['Y'].apply(','.join).str.get_dummies(sep=','))
print(out)
X a b c d e f
0 10 1 1 0 1 0 0
1 20 0 0 0 0 0 0
2 30 1 1 0 0 0 0
3 40 1 1 1 0 1 1
4 50 0 0 0 0 1 0
CodePudding user response:
MultiLabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df[mlb.classes_] = mlb.fit_transform(df['Y'])
Pandas alternative
df.join(df['Y'].explode().str.get_dummies().groupby(level=0).max())
X Y a b c d e f
0 10 [a, b, d] 1 1 0 1 0 0
1 20 [] 0 0 0 0 0 0
2 30 [a, b] 1 1 0 0 0 0
3 40 [a, b, e, f, c] 1 1 1 0 1 1
4 50 [e] 0 0 0 0 1 0
CodePudding user response:
My straight forward solution : Check if the current col is in your Y list or add a 0 :
for col in ['a', 'b', 'c', 'd', 'e']:
df[col] = pd.Series([1 if col in df["Y"][x] else 0 for x in range(len(df.index))])
df = df.drop('Y', axis=1)
print(df)
Edit: Okay, the groupby is cleaner