How can I sort multiple levels of a pandas MultiIndex with custom key functions?-CodePudding

Say I have a dataframe with a MultiIndex like this:

import pandas as pd
import numpy as np

my_index = pd.MultiIndex.from_product(
    [(3,1,2), ("small", "tall", "medium"), ("B", "A", "C")],
    names=["number", "size", "letter"]
)

df_0 = pd.DataFrame(np.random.rand(27, 2), columns=["x", "y"], index=my_index)

                             x         y
number size   letter
3      small  B       0.950073  0.599918
              A       0.014450  0.472736
              C       0.208064  0.778538
       tall   B       0.979631  0.367234
              A       0.832459  0.449875
              C       0.761929  0.053144
       medium B       0.460764  0.800131
              A       0.355746  0.573813
              C       0.078924  0.058865
1      small  B       0.405209  0.354636
              A       0.536242  0.012904
              C       0.458910  0.723627
       tall   B       0.859898  0.442954
              A       0.109729  0.885598
              C       0.378363  0.220695
       medium B       0.652191  0.685181
              A       0.503525  0.400973
              C       0.454671  0.188798
2      small  B       0.407654  0.168782
              A       0.393451  0.083023
              C       0.073432  0.165209
       tall   B       0.678226  0.108497
              A       0.718348  0.077935
              C       0.595500  0.146271
       medium B       0.719985  0.422167
              A       0.950950  0.532390
              C       0.687721  0.920229

Now I want to sort the index by the different levels, first number, then size, and finally letter.

If I do this...

df_1 = df_0.sort_index(level=["number", "size", "letter"], inplace=False)

... the size of course gets sorted in alphabetical order.

                             x         y
number size   letter
1      medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
       small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
       tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
       small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
       tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
       small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
       tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

But I want it to be sorted by a custom key. I know I can sort the size level with a custom sort function like this:

custom_key = np.vectorize(lambda x: {"small": 0, "medium": 1, "tall": 2}[x])

df_2 = df_0.sort_index(level=1, key=custom_key, inplace=False)

                             x         y
number size   letter
1      small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
2      small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
3      small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
1      medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
2      medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
3      medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
1      tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

But how can I sort by all levels like for df_1 and use the custom key on the second level? Expected output:

                             x         y
number size   letter
1      small  A       0.536242  0.012904
              B       0.405209  0.354636
              C       0.458910  0.723627
       medium A       0.503525  0.400973
              B       0.652191  0.685181
              C       0.454671  0.188798
       tall   A       0.109729  0.885598
              B       0.859898  0.442954
              C       0.378363  0.220695
2      small  A       0.393451  0.083023
              B       0.407654  0.168782
              C       0.073432  0.165209
       medium A       0.950950  0.532390
              B       0.719985  0.422167
              C       0.687721  0.920229
       tall   A       0.718348  0.077935
              B       0.678226  0.108497
              C       0.595500  0.146271
3      small  A       0.014450  0.472736
              B       0.950073  0.599918
              C       0.208064  0.778538
       medium A       0.355746  0.573813
              B       0.460764  0.800131
              C       0.078924  0.058865
       tall   A       0.832459  0.449875
              B       0.979631  0.367234
              C       0.761929  0.053144

And how should I define the custom key function, so that I also can access the level in sort_index by name like this?

df_3 = df_0.sort_index(level="size", key=custom_key, inplace=False)

Here, it gives a KeyError: 'Level size not found'

CodePudding user response：

The ideal would be to use ordered Categorical data.

Else, you can use a custom mapper based on the level name:

# define here custom sorters
# all other levels will be sorted by default order
order = {'size': ['small', 'medium', 'tall']}

def sorter(s):
    if s.name in order:
        return s.map({k:v for v,k in enumerate(order[s.name])})
    return s

out = df_0.sort_index(level=["number", "size", "letter"], key=sorter)

Output:

                             x         y
number size   letter                    
1      small  A       0.530753  0.687982
              B       0.722848  0.974920
              C       0.174058  0.695016
       medium A       0.397016  0.550404
              B       0.426989  0.843007
              C       0.929218  0.497728
       tall   A       0.159078  0.005675
              B       0.917871  0.384265
              C       0.685435  0.585242
2      small  A       0.423254  0.838356
              B       0.342158  0.209632
...

CodePudding user response：

A solution with pandas.CategoricalIndex :

categories_order = ['small', 'medium', 'tall']
categories = pd.CategoricalIndex(df_0.index.levels[1].values,
                                 categories=categories_order,
                                 ordered=True)

df_0.index = df_0.index.set_levels(categories, level=1)

out = df_0.sort_index()

# Output :

print(out)

                             x         y
number size   letter                    
1      small  A       0.994537  0.984409
              B       0.153568  0.529762
              C       0.025857  0.750508
       medium A       0.977558  0.488506
              B       0.727121  0.189481
              C       0.465571  0.606248
       tall   A       0.638590  0.530700
              B       0.181290  0.963911
              C       0.068123  0.911834
2      small  A       0.183484  0.459770
              B       0.587073  0.935522
              C       0.068233  0.135137
       medium A       0.315199  0.047945
              B       0.568500  0.756994
              C       0.581868  0.725463
       tall   A       0.881304  0.739188
              B       0.566962  0.855521
              C       0.677635  0.851877
3      small  A       0.497159  0.804575
              B       0.290587  0.104613
              C       0.343545  0.468724
       medium A       0.868754  0.526125
              B       0.759151  0.439089
              C       0.515183  0.266913
       tall   A       0.486962  0.178156
              B       0.902421  0.133138
              C       0.613683  0.827321