I am trying to iterate through content_with_genres1 data frame, then append the genres as columns of 1s or 0s. But oddly the genres has been taken as a string as shown in the image.
Here is my code:
content_with_genres = content_refined.copy(deep=True)
content_with_genres1 = content_with_genres.drop(['content_type','language','rating'], axis=1)
x = []
for index, row in content_with_genres1.iterrows():
x.append(index)
for genre in row['genre']:
content_with_genres1.at[index, genre] = 1
print(len(x) == len(content_with_genres1))
content_with_genres1.head(5)
This is what I am getting - Data frame
I want the data frame to be something like this:
content_id | genre | drama | comedy | action | sports
-------------------------------------------------------
cont_123 | drama | 1 | 0 | 0 | 0
cont_234 | comedy | 0 | 1 | 0 | 0
Please help me with this Thanks in advance
CodePudding user response:
IIUC, you are looking for pd.get_dummies
:
out = pd.concat([df, pd.get_dummies(df['genre'])], axis=1)
print(out)
# Output
content_id genre comedy drama
0 cont_123 drama 0 1
1 cont_456 comedy 1 0
Setup:
>>> df
content_id genre
0 cont_123 drama
1 cont_456 comedy