Can someone help me with breaking genres to different categories-CodePudding

I'm now working with dataset about anime. And I'm not sure how to describe what I want.. Just look, i got pd.series off all genres from dataset. And now i want to compare every genre with rating and maybe find some correlations. I mean, make a group for all titles with genre 'Comedy' for example and find it's mean rating, then compare it to the next group and so on. The problem that in the dataset every single title has more then one genre (that's why i converted it to Series via get_dummies). And now i don't know what i have to do to reach ma goal, maybe you can suggest something to me?

dataset

genre series

CodePudding user response：

A first step to begin is to explode your column Genre like this:

df = df.assign(Genre=df['Genre'].str.split(',')).explode('Genre')
print(df)

# Output
                Anime         Genre
0          Death Note       Mystery
0          Death Note  Supernatural
0          Death Note      Suspense
1  Shingeki no Kyojin        Action
1  Shingeki no Kyojin         Drama
1  Shingeki no Kyojin       Fantasy
1  Shingeki no Kyojin       Mystery

Setup a MRE:

data = {'Anime': ['Death Note', 'Shingeki no Kyojin'],
        'Genre': ['Mystery,Supernatural,Suspense', 'Action,Drama,Fantasy,Mystery']}
df = pd.DataFrame(data)
print(df)

# Output
                Anime                          Genre
0          Death Note  Mystery,Supernatural,Suspense
1  Shingeki no Kyojin   Action,Drama,Fantasy,Mystery