Home > Mobile >  Sort value in a column based on condition of another column in a dataframe
Sort value in a column based on condition of another column in a dataframe

Time:01-21

I have a dataframe that looks like this

Company     Company Code     Product Code        Rating
Monster     MNTR             MNTR/Headphone1     3.2
Monster     MNTR             MNTR/Headphone2     3.9
Monster     MNTR             MNTR/Headphone3     NaN
Monster     MNTR             MNTR/Earbuds1       3.5
Bose        BOSE             BOSE/Headphone1     4.0
Bose        BOSE             BOSE/Earbuds1       NaN
Bose        BOSE             BOSE/Earbuds2       2.8
Apple       APLE             APLE/Headphone1     4.5
Sony        SONY             SONY/Headphone1     3.5
Sony        SONY             SONY/Headphone2     4.8
Sony        SONY             SONY/Earbuds1       3.0
Beats       BEAT             BEAT/Headphone1     3.5
Beats       BEAT             BEAT/Headphone2     3.7

If the Rating is >= 4.0, I want to group by the Company Code and bring all the products of the same company to the top, then sort by their Rating but keeping the original order of the Product Code and the company together. Like Sony, Apple and Bose.

If no ratings of any company products is above 4.0, I would group by the Company Code and sort the Company Code in alphabetical order. Like Beats and Monster.

Company     Company Code     Product Code        Rating
Sony        SONY             SONY/Headphone1     3.5
Sony        SONY             SONY/Headphone2     4.8
Sony        SONY             SONY/Earbuds1       3.0
Apple       APLE             APLE/Headphone1     4.5
Bose        BOSE             BOSE/Headphone1     4.0
Bose        BOSE             BOSE/Earbuds1       NaN
Bose        BOSE             BOSE/Earbuds2       2.8
Beats       BEAT             BEAT/Headphone1     3.5
Beats       BEAT             BEAT/Headphone2     3.7
Monster     MNTR             MNTR/Headphone1     3.2
Monster     MNTR             MNTR/Headphone2     3.9
Monster     MNTR             MNTR/Headphone3     NaN
Monster     MNTR             MNTR/Earbuds1       3.5

I thought about dividing the dataframe into two parts - upper and lower, then use concat to join them back. For example,

condition = df['Rating'] >= 4.0
df_upper = df.loc[condition]
df_lower = df.loc[~condition]
.
.
.
df_merge = pd.concat([df_upper, df_lower], ignore_index=True)

But I have no idea where to apply groupby and sort. Thank you for helping out.

CodePudding user response:

For sorting is used ordered categoricals by Categorical with filter Company Code of filtered rows and last sorting by DataFrame.sort_values:

condition = df['Rating'] >= 4.0

cats1 = df.loc[condition].sort_values('Rating', ascending=False)['Company Code'].unique()
cats2 = df.loc[~condition, 'Company Code'].sort_values().unique()

cats = pd.Index(cats1).union(pd.Index(cats2), sort=False)
print (cats)
Index(['SONY', 'APLE', 'BOSE', 'BEAT', 'MNTR'], dtype='object')

df['Company Code'] = pd.Categorical(df['Company Code'], ordered=True, categories=cats)

df = df.sort_values('Company Code')
print (df)
    Company Company Code     Product Code  Rating
8      Sony         SONY  SONY/Headphone1     3.5
9      Sony         SONY  SONY/Headphone2     4.8
10     Sony         SONY    SONY/Earbuds1     3.0
7     Apple         APLE  APLE/Headphone1     4.5
4      Bose         BOSE  BOSE/Headphone1     4.0
5      Bose         BOSE    BOSE/Earbuds1     NaN
6      Bose         BOSE    BOSE/Earbuds2     2.8
11    Beats         BEAT  BEAT/Headphone1     3.5
12    Beats         BEAT  BEAT/Headphone2     3.7
0   Monster         MNTR  MNTR/Headphone1     3.2
1   Monster         MNTR  MNTR/Headphone2     3.9
2   Monster         MNTR  MNTR/Headphone3     NaN
3   Monster         MNTR    MNTR/Earbuds1     3.5
    
  • Related