I have a dataframe of some information about some artists, their albums, and their tracks.
df = pd.DataFrame({'Artist': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'D', 'E'], 'AlbumId': [201, 201, 451, 390, 390, 272, 272, 698, 698, 235, 312], 'TrackId': [1022, 3472, 9866, 6078, 2634, 3411, 8673, 2543, 5837, 9874, 1089]})
Artist A has 2 albums(201 and 451), with one having 2 tracks(1022 and 3472) and 1 having 1 track(9866).
Artist B has 1 album(390) with 2 tracks(6078 and 2634).
Artist C has 2 albums(272 and 698), with each album having 2 tracks.
Artist D has 1 album(235) with 1 track(9874).
Artist E has 1 album(312) with 1 track(1089).
I want to find the artists who have more than 1 album, and get the rows of these artists accordingly. My desired output looks like this:
I have tried:
groupedArtists = data.groupby(['ArtistId', 'AlbumId']).filter(lambda group: (group.AlbumId.nunique() > 1))
But it seems not to work as expected.
Could someone please help me out? I appreciate it!
CodePudding user response:
You want to group by only ArtistId
, and not AlbumId
:
groupedArtists = data.groupby(['Artist']).filter(lambda x: x['AlbumId'].nunique() > 1)
Output:
>>> groupedArtists
Artist AlbumId TrackId
0 A 201 1022
1 A 201 3472
2 A 451 9866
5 C 375 1022
6 C 412 9866
7 C 375 3472
...
CodePudding user response:
Grouping should be solely by Artist.
Then, for each group, check how many (different) albums it contains and take only groups having more than 1 album.
So the proper solution is:
data.groupby('Artist').filter(lambda grp: grp.AlbumId.nunique() > 1)
CodePudding user response:
This is the solution I found, which is a little more verbose, but perhaps more easily understandable:
counted = df.groupby(['Artist']).size().reset_index(name='counts')
df[df['Artist'].isin(counted[counted.counts > 1].Artist)]
CodePudding user response:
you can create a aggregated DataFrame based on counts of Albums and then filter on the number of albums you want
stats = df.groupby(['Artist'])['AlbumId'].count().reset_index()
morethan1 = stats.loc[stats['AlbumId'] >1]