I would like to create a new Column from the genres column. The genres column contains one or multiple genres and I would like to create a column for each genre name. Then, I would like to fill in 1 and 0 in each column depending on whether they have the genre.
Dataframe should look like in the image below.
I don't have any clue on this.
Using one hot encoder or pandas dummies function straight away didn't work as I got something like this
I don't need something like this
CodePudding user response:
It looks like the values in the Genre
column were one-hot encoded. One-hot encoding is also know as referred to as creating dummy variables.
Pandas has a function pd.get_dummies()
that should enable you one-hot encode the Genre
column. Pass in your data frame and use the columns
parameter to select the Genre
column.
See the function documentation and other options here: https://pandas.pydata.org/docs/reference/api/pandas.get_dummies.html
CodePudding user response:
You can use CategoricalDtype
as below:
import pandas as pd
from pandas.api.types import CategoricalDtype
df = pd.DataFrame({'country': ['Brazil', 'Australia',
'Canada','Brazil','Germany']})
pd.get_dummies(df,prefix=['country'])