How to create new columns according to a label in another columns in pandas-CodePudding

I have a dataframe with 2 columns like:

pd.DataFrame({"action":['asdasd','fgdgddg','dfgdgdfg','dfgdgdfg','nmwaws'],"classification":['positive','negative','neutral','positive','mixed']})

df:

action    classification

asdasd        positive
fgdgddg       negative
dfgdgdfg      neutral
sdfsdff       positive
nmwaws        mixed

What I want to do is to create 4 new columns for each of the unique labels in the columns classification and assign 1 or 0 if the row has or not that label. Like the out put below:

And I need this as outuput:

action    classification    positive   negative   neutral  mixed
asdasd        positive         1          0          0       0
fgdgddg       negative         0          1          0       0
dfgdgdfg      neutral          0          0          1       0
sdfsdff       positive         1          0          0       0
nmwaws        mixed            0          0          0       1

I tried the multilabel Binarizer from sklearn but it parsed all letters of each word, not the word.

Cananyone help me?

CodePudding user response：

You can use pandas.get_dummies.

pd.get_dummies(df["classification"])

Output:

   mixed  negative  neutral  positive
0      0         0        0         1
1      0         1        0         0
2      0         0        1         0
3      0         0        0         1
4      1         0        0         0

If you want to concat it to the DataFrame:

pd.concat([df, pd.get_dummies(df["classification"])], axis=1)

Output:

     action classification  mixed  negative  neutral  positive
0    asdasd       positive      0         0        0         1
1   fgdgddg       negative      0         1        0         0
2  dfgdgdfg        neutral      0         0        1         0
3  dfgdgdfg       positive      0         0        0         1
4    nmwaws          mixed      1         0        0         0