I am using a Billboard-Charts dataset which looks like this:
I want to write a function that receives an arbitrary number of artists as parameters. From these artists, i want to determine the one whose songs have been in the charts the longest. I already managed to write the function i wanted but there is one thing i can't figure out:
How can i get the name of the song which was in the Charts the longest? I just can't figure how to access the groupname after using the .size() function.
def determine_most_popular_performer(*performers):
results = []
for performer in performers:
results.append((performer, max(df.loc[df["performer"]==performer].groupby("song").size())))
return max(results)
print(determine_most_popular_performer("Queen", "Prince", "Michael Jackson"))
>> ('Queen', 44)
As an output i would want ('Queen', 'Bohemian Rapsody', 44)
CodePudding user response:
You can access the max row with .idxmax()
.
You should then be able to select that row and access the values in that row with the following changes. Note that I used .reset_index()
to set the groupby index as a column.
def determine_most_popular_performer(*performers):
results = []
for performer in performers:
df2 = df.loc[df["performer"]==performer].groupby("song").size().reset_index(name="value")
max_id = df2["value"].idxmax()
results.append((performer, df2.loc[max_id]["song"], df2.loc[max_id]["value"]))
return max(results)