Imagine I have a dataset that is like so:
ID prod_title total_sales
0 619040 All Veggie Yummies 72.99
1 619041 Ball and String 18.95
2 619042 Cat Cave 28.45
3 619043 Chewie Dental 24.95
4 619044 Chomp-a Plush 60.99
5 619045 Feline Fix Mix 65.99
6 619046 Fetch Blaster 9.95
7 619047 Foozy Mouse 45.99
8 619048 Kitty Climber 35.99
9 619049 Purr Mix 32.99
10 619050 Fetch Blaster 19.90
11 619051 Purr Mix 98.97
12 619052 Cat Cave 56.90
13 619053 Purrfect Puree 54.95
14 619054 Foozy Mouse 91.98
15 619055 Reddy Beddy 21.95
16 619056 Cat Cave 85.83
17 619057 Scratchy Post 48.95
18 619058 Snack-em Fish 15.99
19 619059 Snoozer Essentails 99.95
20 619060 Scratchy Post 48.95
21 619061 Purrfect Puree 219.80
22 619062 Chewie Dental 49.90
23 619063 Reddy Beddy 65.85
24 619064 The New Bone 71.96
25 619065 Reddy Beddy 109.75
What are the top ten product titles by total dollar amount made? Display in descending order. Store in variable top_tot_sales
The answer should be something like this; though this isn't the correct answer just an example: ['Purrfect Puree' 'Ball and String ' 'Fetch Blaster ' 'Reddy Beddy' 'Chomp-a Plush' 'Foozy Mouse ' 'Kitty Climber' 'Snack-em Fish' 'Snoozer Essentails' 'Cat Cave']
I have tried groupby's, nlargest, apply, unique, combos of .loc groupby idxmax and many more. I'm just struggling trying to figure out how to isolate these columns in my question and get a list of the top ten prod_title
.
I'll add the code i've tried below
top_tot_sales = df_cleaned.loc[df_cleaned.groupby('prod_title')['total_sales'].idxmax()]
df_cleaned.nlargest(10, 'total_sales')
df_cleaned['prod_title'].drop_duplicates()
df_cleaned['prod_title'].unique()
top_tot_sales = df_cleaned.groupby(['prod_title'])['total_sales'].transform(max) == df_cleaned['total_sales']
print(top_tot_sales)
df_cleaned['prod_title'].drop_duplicates()
df_cleaned['prod_title'].unique()
top_tot_sales = df_cleaned.groupby(['prod_title'])df_cleaned.nlargest(n=10, columns=['total_sales'])
print(top_tot_sales)
top_tot_sales = df_cleaned.groupby('prod_title')['total_sales'].nlargest(n=10)
print(top_tot_sales)
top_num_sales = df_cleaned.loc[df_cleaned.groupby('prod_title')['trans_quantity'].idxmax()]
CodePudding user response:
df.sort_values('total_sales', ascending=False).drop_duplicates(subset=['prod_title']).iloc[:10]['prod_title'].values
sort_values() sorts your dataframe, make sure to set ascending to false.
drop_duplicates will get rid of duplicate products
iloc will select your first 10 items, since you already sorted, they will be the top 10.
['prod_title'].values will return an array of prod_titles from the resulting dataframe.
CodePudding user response:
Is this what you are looking for?
df.groupby('prod_title').sum().sort_values('total_sales', ascending=False).index[:10].values
Result
['Purrfect Puree' 'Reddy Beddy' 'Cat Cave' 'Foozy Mouse' 'Purr Mix'
'Snoozer Essentails' 'Scratchy Post' 'Chewie Dental' 'All Veggie Yummies'
'The New Bone']