For example,
input.csv is as follows:
Song Name | Genre |
---|---|
7 Rings | 'dance pop', 'pop', 'post-teen pop' |
Run | 'dance pop', 'piano rock', 'pop', 'pop rock' |
Dance Monkey | 'australian pop', 'pop' |
All Of Me | 'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary' |
I want to group it in a way where I can get something like:
pop: ['7 Rings', 'Run', 'Dance Monkey', 'All Of Me']
dance pop : ['7 Rings','Run']
r&b: ['All Of Me']
And even put this into another table/dataframe/csv like:
pop | dance pop | r&b | neo soul | pop rock |
---|---|---|---|---|
7 Rings | 7 Rings | All Of Me | All Of Me | Run |
Run | Run | |||
Dance Monkey | ||||
All Of Me |
Is there anyway to do this?
Edit:
trying out mozway's suggestion, I got a table that would look like this instead:
genreExplode=df.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head()
Genre | 'dance pop', 'pop', 'post-teen pop' | 'dance pop', 'piano rock', 'pop', 'pop rock' | 'australian pop', 'pop' | 'neo soul', 'pop', 'pop soul', 'r&b', 'urban contemporary' |
---|---|---|---|---|
index | ||||
0 | 7 Rings | Run | Dance Monkey | All Of Me |
Edit 2:
Figured out the problem, the objects in the Genre column looked like lists, but were actually strings.
genrelist=df['Genre'].tolist() ##first make a list of the Genre column
genrelist_new=[] ## new list to hold lists
import ast ## found this online
for x in genrelist:
x=ast.literal_eval(x) ##this loop takes the string objects that look like list in genrelist and converts them into list
genrelist_new.append(x) ##then add the converted list and put into a list
df['Genre']=genrelist_new ##replace old Genre column of strings that look like lists to new column of real lists
genreExplode=spotData.explode('Genre').assign(index=lambda d: d.groupby('Genre').cumcount()).pivot(index='index', columns='Genre', values='Song Name').fillna('')
genreExplode.head() ## this result is what I was looking for!
Solution, convert the strings into real lists, so that Genre column is a list of lists.
Then @mozway's code works flawlessly.
CodePudding user response:
Assuming "Genre" contains lists (e.g., ['dance pop', 'pop', 'post-teen pop']
). You can explode
and pivot
:
(df.explode('Genre')
.assign(index=lambda d: d.groupby('Genre').cumcount())
.pivot(index='index', columns='Genre', values='Song')
.fillna('')
)
output:
Genre australian pop dance pop neo soul piano rock pop pop rock pop soul post-teen pop r&b urban contemporary
index
0 Dance Monkey 7 Rings All Of Me Run 7 Rings Run All Of Me 7 Rings All Of Me All Of Me
1 Run Run
2 Dance Monkey
3 All Of Me