I have a dataframe with 2 columns like:
pd.DataFrame({"action":['asdasd','fgdgddg','dfgdgdfg','dfgdgdfg','nmwaws'],"classification":['positive','negative','neutral','positive','mixed']})
df:
action classification
asdasd positive
fgdgddg negative
dfgdgdfg neutral
sdfsdff positive
nmwaws mixed
What I want to do is to create 4 new columns for each of the unique labels in the columns classification
and assign 1
or 0
if the row has or not that label. Like the out put below:
And I need this as outuput:
action classification positive negative neutral mixed
asdasd positive 1 0 0 0
fgdgddg negative 0 1 0 0
dfgdgdfg neutral 0 0 1 0
sdfsdff positive 1 0 0 0
nmwaws mixed 0 0 0 1
I tried the multilabel Binarizer from sklearn but it parsed all letters of each word, not the word.
Cananyone help me?
CodePudding user response:
You can use pandas.get_dummies.
pd.get_dummies(df["classification"])
Output:
mixed negative neutral positive
0 0 0 0 1
1 0 1 0 0
2 0 0 1 0
3 0 0 0 1
4 1 0 0 0
If you want to concat it to the DataFrame:
pd.concat([df, pd.get_dummies(df["classification"])], axis=1)
Output:
action classification mixed negative neutral positive
0 asdasd positive 0 0 0 1
1 fgdgddg negative 0 1 0 0
2 dfgdgdfg neutral 0 0 1 0
3 dfgdgdfg positive 0 0 0 1
4 nmwaws mixed 1 0 0 0