I have a dataframe that looks like this
Company Company Code Product Code Rating
Monster MNTR MNTR/Headphone1 3.2
Monster MNTR MNTR/Headphone2 3.9
Monster MNTR MNTR/Headphone3 NaN
Monster MNTR MNTR/Earbuds1 3.5
Bose BOSE BOSE/Headphone1 4.0
Bose BOSE BOSE/Earbuds1 NaN
Bose BOSE BOSE/Earbuds2 2.8
Apple APLE APLE/Headphone1 4.5
Sony SONY SONY/Headphone1 3.5
Sony SONY SONY/Headphone2 4.8
Sony SONY SONY/Earbuds1 3.0
Beats BEAT BEAT/Headphone1 3.5
Beats BEAT BEAT/Headphone2 3.7
If the Rating
is >= 4.0, I want to group by the Company Code
and bring all the products of the same company to the top, then sort by their Rating
but keeping the original order of the Product Code
and the company together. Like Sony, Apple and Bose.
If no ratings of any company products is above 4.0, I would group by the Company Code
and sort the Company Code
in alphabetical order. Like Beats and Monster.
Company Company Code Product Code Rating
Sony SONY SONY/Headphone1 3.5
Sony SONY SONY/Headphone2 4.8
Sony SONY SONY/Earbuds1 3.0
Apple APLE APLE/Headphone1 4.5
Bose BOSE BOSE/Headphone1 4.0
Bose BOSE BOSE/Earbuds1 NaN
Bose BOSE BOSE/Earbuds2 2.8
Beats BEAT BEAT/Headphone1 3.5
Beats BEAT BEAT/Headphone2 3.7
Monster MNTR MNTR/Headphone1 3.2
Monster MNTR MNTR/Headphone2 3.9
Monster MNTR MNTR/Headphone3 NaN
Monster MNTR MNTR/Earbuds1 3.5
I thought about dividing the dataframe into two parts - upper and lower, then use concat to join them back. For example,
condition = df['Rating'] >= 4.0
df_upper = df.loc[condition]
df_lower = df.loc[~condition]
.
.
.
df_merge = pd.concat([df_upper, df_lower], ignore_index=True)
But I have no idea where to apply groupby and sort. Thank you for helping out.
CodePudding user response:
For sorting is used ordered categoricals by Categorical
with filter Company Code
of filtered rows and last sorting by DataFrame.sort_values
:
condition = df['Rating'] >= 4.0
cats1 = df.loc[condition].sort_values('Rating', ascending=False)['Company Code'].unique()
cats2 = df.loc[~condition, 'Company Code'].sort_values().unique()
cats = pd.Index(cats1).union(pd.Index(cats2), sort=False)
print (cats)
Index(['SONY', 'APLE', 'BOSE', 'BEAT', 'MNTR'], dtype='object')
df['Company Code'] = pd.Categorical(df['Company Code'], ordered=True, categories=cats)
df = df.sort_values('Company Code')
print (df)
Company Company Code Product Code Rating
8 Sony SONY SONY/Headphone1 3.5
9 Sony SONY SONY/Headphone2 4.8
10 Sony SONY SONY/Earbuds1 3.0
7 Apple APLE APLE/Headphone1 4.5
4 Bose BOSE BOSE/Headphone1 4.0
5 Bose BOSE BOSE/Earbuds1 NaN
6 Bose BOSE BOSE/Earbuds2 2.8
11 Beats BEAT BEAT/Headphone1 3.5
12 Beats BEAT BEAT/Headphone2 3.7
0 Monster MNTR MNTR/Headphone1 3.2
1 Monster MNTR MNTR/Headphone2 3.9
2 Monster MNTR MNTR/Headphone3 NaN
3 Monster MNTR MNTR/Earbuds1 3.5