How to put element into multiple categories in python/pandas?-CodePudding

For example,

input.csv is as follows:

Song Name	Genre
7 Rings	'dance pop', 'pop', 'post-teen pop'
Run	'dance pop', 'piano rock', 'pop', 'pop rock'
Dance Monkey	'australian pop', 'pop'
All Of Me	'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary'

I want to group it in a way where I can get something like:

pop: ['7 Rings', 'Run', 'Dance Monkey', 'All Of Me']
dance pop : ['7 Rings','Run']
r&b: ['All Of Me']

And even put this into another table/dataframe/csv like:

pop	dance pop	r&b	neo soul	pop rock
7 Rings	7 Rings	All Of Me	All Of Me	Run
Run	Run
Dance Monkey
All Of Me

Is there anyway to do this?

Edit:

trying out mozway's suggestion, I got a table that would look like this instead:

genreExplode=df.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head()

Genre	'dance pop', 'pop', 'post-teen pop'	'dance pop', 'piano rock', 'pop', 'pop rock'	'australian pop', 'pop'	'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary'
index
0	7 Rings	Run	Dance Monkey	All Of Me

Edit 2:

Figured out the problem, the objects in the Genre column looked like lists, but were actually strings.

genrelist=df['Genre'].tolist() ##first make a list of the Genre column
genrelist_new=[] ## new list to hold lists

import ast ## found this online
for x in genrelist:
    x=ast.literal_eval(x) ##this loop takes the string objects that look like list in genrelist and converts them into list
    genrelist_new.append(x) ##then add the converted list and put into a list

df['Genre']=genrelist_new ##replace old Genre column of strings that look like lists to new column of real lists
genreExplode=spotData.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head() ## this result is what I was looking for!

Solution, convert the strings into real lists, so that Genre column is a list of lists.

Then @mozway's code works flawlessly.

CodePudding user response：

Assuming "Genre" contains lists (e.g., ['dance pop', 'pop', 'post-teen pop']). You can explode and pivot:

(df.explode('Genre')
   .assign(index=lambda d: d.groupby('Genre').cumcount())
   .pivot(index='index', columns='Genre', values='Song')
   .fillna('')
)

output:

Genre australian pop dance pop   neo soul piano rock           pop pop rock   pop soul post-teen pop        r&b urban contemporary
index                                                                                                                             
0       Dance Monkey   7 Rings  All Of Me        Run       7 Rings      Run  All Of Me       7 Rings  All Of Me          All Of Me
1                          Run                                 Run                                                                
2                                                     Dance Monkey                                                                
3                                                        All Of Me