List of top ten values in column no duplicates based on another column-CodePudding

Imagine I have a dataset that is like so:

         ID           prod_title            total_sales
0        619040       All Veggie Yummies    72.99
1        619041       Ball and String       18.95
2        619042       Cat Cave              28.45     
3        619043       Chewie Dental         24.95
4        619044       Chomp-a Plush         60.99
5        619045       Feline Fix Mix        65.99     
6        619046       Fetch Blaster         9.95
7        619047       Foozy Mouse           45.99
8        619048       Kitty Climber         35.99     
9        619049       Purr Mix              32.99
10       619050       Fetch Blaster         19.90
11       619051       Purr Mix              98.97    
12       619052       Cat Cave              56.90
13       619053       Purrfect Puree        54.95
14       619054       Foozy Mouse           91.98
15       619055       Reddy Beddy           21.95     
16       619056       Cat Cave              85.83
17       619057       Scratchy Post         48.95
18       619058       Snack-em Fish         15.99     
19       619059       Snoozer Essentails    99.95
20       619060       Scratchy Post         48.95
21       619061       Purrfect Puree        219.80     
22       619062       Chewie Dental         49.90
23       619063       Reddy Beddy           65.85
24       619064       The New Bone          71.96     
25       619065       Reddy Beddy           109.75

What are the top ten product titles by total dollar amount made? Display in descending order. Store in variable top_tot_sales

The answer should be something like this; though this isn't the correct answer just an example: ['Purrfect Puree' 'Ball and String ' 'Fetch Blaster ' 'Reddy Beddy' 'Chomp-a Plush' 'Foozy Mouse ' 'Kitty Climber' 'Snack-em Fish' 'Snoozer Essentails' 'Cat Cave']

I have tried groupby's, nlargest, apply, unique, combos of .loc groupby idxmax and many more. I'm just struggling trying to figure out how to isolate these columns in my question and get a list of the top ten prod_title.

I'll add the code i've tried below

top_tot_sales = df_cleaned.loc[df_cleaned.groupby('prod_title')['total_sales'].idxmax()]
df_cleaned.nlargest(10, 'total_sales')

df_cleaned['prod_title'].drop_duplicates()
df_cleaned['prod_title'].unique()
top_tot_sales = df_cleaned.groupby(['prod_title'])['total_sales'].transform(max) == df_cleaned['total_sales']
print(top_tot_sales)

df_cleaned['prod_title'].drop_duplicates()
df_cleaned['prod_title'].unique()
top_tot_sales = df_cleaned.groupby(['prod_title'])df_cleaned.nlargest(n=10, columns=['total_sales'])
print(top_tot_sales)

top_tot_sales = df_cleaned.groupby('prod_title')['total_sales'].nlargest(n=10)
print(top_tot_sales)

top_num_sales = df_cleaned.loc[df_cleaned.groupby('prod_title')['trans_quantity'].idxmax()]

CodePudding user response：

df.sort_values('total_sales', ascending=False).drop_duplicates(subset=['prod_title']).iloc[:10]['prod_title'].values

sort_values() sorts your dataframe, make sure to set ascending to false.

drop_duplicates will get rid of duplicate products

iloc will select your first 10 items, since you already sorted, they will be the top 10.

['prod_title'].values will return an array of prod_titles from the resulting dataframe.

CodePudding user response：

Is this what you are looking for?

df.groupby('prod_title').sum().sort_values('total_sales', ascending=False).index[:10].values

Result

['Purrfect Puree' 'Reddy Beddy' 'Cat Cave' 'Foozy Mouse' 'Purr Mix'
 'Snoozer Essentails' 'Scratchy Post' 'Chewie Dental' 'All Veggie Yummies'
 'The New Bone']