Home > OS >  How to put element into multiple categories in python/pandas?
How to put element into multiple categories in python/pandas?

Time:09-24

For example,

input.csv is as follows:

Song Name Genre
7 Rings 'dance pop', 'pop', 'post-teen pop'
Run 'dance pop', 'piano rock', 'pop', 'pop rock'
Dance Monkey 'australian pop', 'pop'
All Of Me 'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary'

I want to group it in a way where I can get something like:

pop: ['7 Rings', 'Run', 'Dance Monkey', 'All Of Me']
dance pop : ['7 Rings','Run']
r&b: ['All Of Me']

And even put this into another table/dataframe/csv like:

pop dance pop r&b neo soul pop rock
7 Rings 7 Rings All Of Me All Of Me Run
Run Run
Dance Monkey
All Of Me

Is there anyway to do this?

Edit:

trying out mozway's suggestion, I got a table that would look like this instead:

genreExplode=df.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head()
Genre 'dance pop', 'pop', 'post-teen pop' 'dance pop', 'piano rock', 'pop', 'pop rock' 'australian pop', 'pop' 'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary'
index
0 7 Rings Run Dance Monkey All Of Me

Edit 2:

Figured out the problem, the objects in the Genre column looked like lists, but were actually strings.

genrelist=df['Genre'].tolist() ##first make a list of the Genre column
genrelist_new=[] ## new list to hold lists

import ast ## found this online
for x in genrelist:
    x=ast.literal_eval(x) ##this loop takes the string objects that look like list in genrelist and converts them into list
    genrelist_new.append(x) ##then add the converted list and put into a list

df['Genre']=genrelist_new ##replace old Genre column of strings that look like lists to new column of real lists
genreExplode=spotData.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head() ## this result is what I was looking for!

Solution, convert the strings into real lists, so that Genre column is a list of lists.

Then @mozway's code works flawlessly.

CodePudding user response:

Assuming "Genre" contains lists (e.g., ['dance pop', 'pop', 'post-teen pop']). You can explode and pivot:

(df.explode('Genre')
   .assign(index=lambda d: d.groupby('Genre').cumcount())
   .pivot(index='index', columns='Genre', values='Song')
   .fillna('')
)

output:

Genre australian pop dance pop   neo soul piano rock           pop pop rock   pop soul post-teen pop        r&b urban contemporary
index                                                                                                                             
0       Dance Monkey   7 Rings  All Of Me        Run       7 Rings      Run  All Of Me       7 Rings  All Of Me          All Of Me
1                          Run                                 Run                                                                
2                                                     Dance Monkey                                                                
3                                                        All Of Me                                                                
  • Related