How can I go from a string column with list of labels to the format shown below?
This is what I have:
pd.DataFrame([["a",1],["b","1, 2"],["c","1,3,4"]], columns =['id', 'label'])
This is what I want:
pd.DataFrame([["a",1,0,0,0],["b",1,1,0,0],["c",1,0,1,1]], columns =['id', '1', '2', '3', '4'])
I can do this with a for loop but the execution time is horrendous.
CodePudding user response:
You can also use:
df['label'] = df['label'].str.replace(' ', '').str.split(',')
df = df.explode('label')
df = df.pivot_table(index= 'id', columns=['label'], aggfunc=any).fillna(False).astype(int)
CodePudding user response:
Use .str.get_dummies()
:
df = pd.concat([df.drop('label', axis=1), df['label'].str.get_dummies(',')], axis=1)
Output:
>>> df
id 1 2 3 4
0 a 1 0 0 0
1 b 1 1 0 0
2 c 1 0 1 1