Home > OS >  how to groupby multiple columns
how to groupby multiple columns

Time:04-24

So basically I have this dataframe called df_genre

userid|genre|total_streams
123   |Pop  |4252
124   |Pop  |220
125   |k-Pop|111
126   |jaz  |1092
127   |Hip Hop |1102
60k more lines

and I have another dataframe called: df_info

userid|age
123   |21  
124   |19  
125   |30
126   |32
127   |25
60k more lines

I basically have 5234 genre type and each line contains the userid with the total number of streams for each genre he played. what I want is to get the top 20 genre with the most streams while showing the total count of the people who streamed them and then getting the average age of these people. what I did is using group by like this

  df_new = (df_genre.groupby('genre')
       .agg({'userid':'count', 'total_streams': 'sum'})
       .reset_index()
       .rename(columns={'userid':'User count'})
 )

the result was: df_new

genre   |User count   |total streams
Pop     |40292        |10274929
jazz    |29104        |920374
....

so I got what I wanted but now I can't get the average age because I just counted the userid with the highest total stream of every genre. what is the best way and the less consuming now after this step to get the age of each user for each genre and count the average age?

what I tried is to loop over df_new getting the top 20 genre( so I looped over Pop first) then I went to df_genre and I looped through people who streamed Pop then matched their userid with the userid in df_info and got the age. But this took waaaay too long and didnt give me proper answer as I have 20 different genres to loop from and more than 60k lines of data.

How can I fix this what to do?

CodePudding user response:

You can merge your dataframes first:

df_new = (
    df_genre
    .merge(df_info, on='userid')
    .groupby('genre')
    .agg({'userid': 'count', 'total_streams': sum, 'age': 'mean'})
)

>>> df_new
         userid  total_streams   age
genre                               
Hip Hop       1           1102  25.0
Pop           2           4472  20.0
jaz           1           1092  32.0
k-Pop         1            111  30.0
  • Related