Home > OS >  I dont want the the column names as a string of different letters
I dont want the the column names as a string of different letters

Time:03-24

I am trying to iterate through content_with_genres1 data frame, then append the genres as columns of 1s or 0s. But oddly the genres has been taken as a string as shown in the image.

Here is my code:

content_with_genres = content_refined.copy(deep=True)
content_with_genres1 = content_with_genres.drop(['content_type','language','rating'], axis=1)
x = []
for index, row in content_with_genres1.iterrows():
    x.append(index)
    for genre in row['genre']:
        content_with_genres1.at[index, genre] = 1

print(len(x) == len(content_with_genres1))
content_with_genres1.head(5)

This is what I am getting - Data frame

I want the data frame to be something like this:

content_id | genre  | drama | comedy | action | sports 
-------------------------------------------------------
cont_123   | drama  |   1   |   0    |   0    |   0
cont_234   | comedy |   0   |   1    |   0    |   0

Please help me with this Thanks in advance

CodePudding user response:

IIUC, you are looking for pd.get_dummies:

out = pd.concat([df, pd.get_dummies(df['genre'])], axis=1)
print(out)

# Output
  content_id   genre  comedy  drama
0   cont_123   drama       0      1
1   cont_456  comedy       1      0

Setup:

>>> df
  content_id   genre
0   cont_123   drama
1   cont_456  comedy
  • Related