I'm now working with dataset about anime. And I'm not sure how to describe what I want.. Just look, i got pd.series off all genres from dataset. And now i want to compare every genre with rating and maybe find some correlations. I mean, make a group for all titles with genre 'Comedy' for example and find it's mean rating, then compare it to the next group and so on. The problem that in the dataset every single title has more then one genre (that's why i converted it to Series via get_dummies). And now i don't know what i have to do to reach ma goal, maybe you can suggest something to me?
CodePudding user response:
A first step to begin is to explode
your column Genre
like this:
df = df.assign(Genre=df['Genre'].str.split(',')).explode('Genre')
print(df)
# Output
Anime Genre
0 Death Note Mystery
0 Death Note Supernatural
0 Death Note Suspense
1 Shingeki no Kyojin Action
1 Shingeki no Kyojin Drama
1 Shingeki no Kyojin Fantasy
1 Shingeki no Kyojin Mystery
Setup a MRE:
data = {'Anime': ['Death Note', 'Shingeki no Kyojin'],
'Genre': ['Mystery,Supernatural,Suspense', 'Action,Drama,Fantasy,Mystery']}
df = pd.DataFrame(data)
print(df)
# Output
Anime Genre
0 Death Note Mystery,Supernatural,Suspense
1 Shingeki no Kyojin Action,Drama,Fantasy,Mystery