Home > Enterprise >  Create Binarized Rows in Python Pandas Dataframe
Create Binarized Rows in Python Pandas Dataframe

Time:02-22

Suppose data frame df is

d = { 'Title': ['Elden Ring', 'Starcraft 2', 'Terraforming Mars'], 'Genre' : [ 'Fantasy;Videogame', 'Videogame', 'Fantasy;Boardgame'] }
pd.DataFrame(data=d, index=None)

Such that it's

Elden Ring          Fantasy;Videogame
Starcraft 2         Videogame
Terraforming Mars   Fantasy;Boardgame

My goal is to end with a dataframe that looks like this:

Title               Genres                 Fantasy     Videogame   Boardgame
Elden Ring          [Fantasy, Videogame]      1            1            0
Starcraft 2         [Videogame]              0            1            0
Terraforming Mars   [Fantasy, Boardgame]      1            0            1

How is the best way to go about this? I tried doing

from sklearn.preprocessing import MultiLabelBinarizer
df = pd.DataFrame(data=d, index=None)
df.Genre = df.Genre.str.split(';')
binar = MultiLabelBinarizer()
genre_labels = binar.fit_transform( df.Genre )
df[ binar.classes_ ] = genre_labels

This gives me a dataframe:

Title             Genre                 Boardgame   Fantasy     Videogame
Elden Ring        [Fantasy, Videogame]  0             1             1
Starcraft 2       [Videogame]           0             0             1
Terraforming Mars [Fantasy, Boardgame]  1             1             0

This gives me what I want but it felt convoluted to do. Is there a cleaner way to be doing this?

CodePudding user response:

Or use Series.str.get_dummies:

df.Genre.str.strip('[]').str.get_dummies(sep=', ')
   Boardgame  Fantasy  Videogame
0          0        1          1
1          0        0          1
2          1        1          0

To append to dataframe:

pd.concat([df, df.Genre.str.strip('[]').str.get_dummies(sep=', ')], axis=1)

               Title                 Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  [Fantasy, Videogame]          0        1          1
1        Starcraft 2           [Videogame]          0        0          1
2  Terraforming Mars  [Fantasy, Boardgame]          1        1          0

If Genre is started as list type:

df.Genre = df.Genre.str.join(';')
pd.concat([df, df.Genre.str.get_dummies(sep=';')], axis=1)

               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0

CodePudding user response:

.str.get_dummies was designed specifically for this:

df = pd.concat([df, df['Genre'].str.get_dummies(';')], axis=1)

Output:

>>> df
               Title              Genre  Boardgame  Fantasy  Videogame
0         Elden Ring  Fantasy;Videogame          0        1          1
1        Starcraft 2          Videogame          0        0          1
2  Terraforming Mars  Fantasy;Boardgame          1        1          0
  • Related