Home > Net >  Dataframe Multi-Label List Column to One-Hot
Dataframe Multi-Label List Column to One-Hot

Time:03-15

How can I go from a string column with list of labels to the format shown below?

This is what I have:

pd.DataFrame([["a",1],["b","1, 2"],["c","1,3,4"]], columns =['id', 'label'])

This is what I want:

pd.DataFrame([["a",1,0,0,0],["b",1,1,0,0],["c",1,0,1,1]], columns =['id', '1', '2', '3', '4'])

I can do this with a for loop but the execution time is horrendous.

CodePudding user response:

You can also use:

df['label'] = df['label'].str.replace(' ', '').str.split(',')
df = df.explode('label')
df = df.pivot_table(index= 'id', columns=['label'], aggfunc=any).fillna(False).astype(int)

CodePudding user response:

Use .str.get_dummies():

df = pd.concat([df.drop('label', axis=1), df['label'].str.get_dummies(',')], axis=1)

Output:

>>> df
  id  1  2  3  4
0  a  1  0  0  0
1  b  1  1  0  0
2  c  1  0  1  1
  • Related