I am new to pandas. I am trying to move the items of a column to the columns of dataframe. I am struggling for hours but could not do so.

MWE

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'X': [10,20,30,40,50],
    'Y': [list('abd'), list(), list('ab'),list('abefc'),list('e')]
})

print(df)
    X                Y
0  10        [a, b, d]
1  20               []
2  30           [a, b]
3  40  [a, b, e, f, c]
4  50              [e]

How to get the result like this:

    X  a  b  c  d  e
0  10  1  1  0  1  0
1  20  0  0  0  0  0
2  30  1  1  0  0  0
3  40  1  1  1  0  1
4  50  0  0  0  0  1

CodePudding user response：

You can try pandas.Series.str.get_dummies

out = df[['X']].join(df['Y'].apply(','.join).str.get_dummies(sep=','))

print(out)

    X  a  b  c  d  e  f
0  10  1  1  0  1  0  0
1  20  0  0  0  0  0  0
2  30  1  1  0  0  0  0
3  40  1  1  1  0  1  1
4  50  0  0  0  0  1  0

CodePudding user response：

`MultiLabelBinarizer`

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df[mlb.classes_] = mlb.fit_transform(df['Y'])

Pandas alternative

df.join(df['Y'].explode().str.get_dummies().groupby(level=0).max())

    X                Y  a  b  c  d  e  f
0  10        [a, b, d]  1  1  0  1  0  0
1  20               []  0  0  0  0  0  0
2  30           [a, b]  1  1  0  0  0  0
3  40  [a, b, e, f, c]  1  1  1  0  1  1
4  50              [e]  0  0  0  0  1  0

CodePudding user response：

My straight forward solution : Check if the current col is in your Y list or add a 0 :

for col in ['a', 'b', 'c', 'd', 'e']:
     df[col] = pd.Series([1 if col in df["Y"][x] else 0 for x in range(len(df.index))])

df = df.drop('Y', axis=1)
print(df)

Edit: Okay, the groupby is cleaner