How to include a sort criteria into a pivot

Below is my code where I used pivot_table function on df dataframe.

df = pd.DataFrame({'State' : ['B','B','A','A','C', 'C'],
           'Age' : ['1 to 5', '6 to 10', '1 to 5', '6 to 10', '1 to 5', '6 to 10'],
           'Vaccinated' : [80, 20, 30, 60, 10, 15],
           'Population': [100, 100, 100, 100, 100, 100],
           'Percentage' : [0.80, 0.20, 0.30, 0.60, 0.10,0.15]})

df1 = pd.pivot_table(df,values=["Vaccinated", "Population","Percentage"],index=["State", "Age"], aggfunc=np.sum)

The results from the earlier code:

                   Percentage  Population  Vaccinated
State Age                                        
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15

However, I want to sort my records so that State B is at the top followed by A then C. Rational is because State B has 100% vaccinated (60% 40%), State A has 90% (60% 30%) & State C has 25%. Try to add in sort a few times and i encountered errors.

May I seek advice how to add in a sort criteria during or after pivot_table so that i can achieve the results below?

               Percentage  Population  Vaccinated
State Age                                        
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15

CodePudding user response：

We can use groupby sum to get the total Vaccinated per State, then sort_values to determine the order that the states should be in, then we can reindex on the State level to reorder based on the group total:

df1 = df1.reindex(
    index=df1.groupby(level='State')['Vaccinated'].sum()
        .sort_values(ascending=False).index,
    level='State'
)

df:

               Percentage  Population  Vaccinated
State Age                                        
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15

CodePudding user response：

One way is to make helper column with group sum, sort the df by it and then delete it:

df1 = df1.assign(Sum=df1.groupby(level=0).Vaccinated.transform('sum')).\
    sort_values(by='Sum', ascending=False).drop(columns=['Sum'])
print(df1)

Prints:

               Percentage  Population  Vaccinated
State Age                                        
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15