Say I have a dataframe with a MultiIndex like this:
import pandas as pd
import numpy as np
my_index = pd.MultiIndex.from_product(
[(3,1,2), ("small", "tall", "medium"), ("B", "A", "C")],
names=["number", "size", "letter"]
)
df_0 = pd.DataFrame(np.random.rand(27, 2), columns=["x", "y"], index=my_index)
x y
number size letter
3 small B 0.950073 0.599918
A 0.014450 0.472736
C 0.208064 0.778538
tall B 0.979631 0.367234
A 0.832459 0.449875
C 0.761929 0.053144
medium B 0.460764 0.800131
A 0.355746 0.573813
C 0.078924 0.058865
1 small B 0.405209 0.354636
A 0.536242 0.012904
C 0.458910 0.723627
tall B 0.859898 0.442954
A 0.109729 0.885598
C 0.378363 0.220695
medium B 0.652191 0.685181
A 0.503525 0.400973
C 0.454671 0.188798
2 small B 0.407654 0.168782
A 0.393451 0.083023
C 0.073432 0.165209
tall B 0.678226 0.108497
A 0.718348 0.077935
C 0.595500 0.146271
medium B 0.719985 0.422167
A 0.950950 0.532390
C 0.687721 0.920229
Now I want to sort the index by the different levels, first number, then size, and finally letter.
If I do this...
df_1 = df_0.sort_index(level=["number", "size", "letter"], inplace=False)
... the size of course gets sorted in alphabetical order.
x y
number size letter
1 medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
But I want it to be sorted by a custom key. I know I can sort the size level with a custom sort function like this:
custom_key = np.vectorize(lambda x: {"small": 0, "medium": 1, "tall": 2}[x])
df_2 = df_0.sort_index(level=1, key=custom_key, inplace=False)
x y
number size letter
1 small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
2 small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
3 small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
1 medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
2 medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
3 medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
1 tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
But how can I sort by all levels like for df_1 and use the custom key on the second level? Expected output:
x y
number size letter
1 small A 0.536242 0.012904
B 0.405209 0.354636
C 0.458910 0.723627
medium A 0.503525 0.400973
B 0.652191 0.685181
C 0.454671 0.188798
tall A 0.109729 0.885598
B 0.859898 0.442954
C 0.378363 0.220695
2 small A 0.393451 0.083023
B 0.407654 0.168782
C 0.073432 0.165209
medium A 0.950950 0.532390
B 0.719985 0.422167
C 0.687721 0.920229
tall A 0.718348 0.077935
B 0.678226 0.108497
C 0.595500 0.146271
3 small A 0.014450 0.472736
B 0.950073 0.599918
C 0.208064 0.778538
medium A 0.355746 0.573813
B 0.460764 0.800131
C 0.078924 0.058865
tall A 0.832459 0.449875
B 0.979631 0.367234
C 0.761929 0.053144
And how should I define the custom key function, so that I also can access the level in sort_index by name like this?
df_3 = df_0.sort_index(level="size", key=custom_key, inplace=False)
Here, it gives a KeyError: 'Level size not found'
CodePudding user response:
The ideal would be to use ordered Categorical
data.
Else, you can use a custom mapper based on the level name:
# define here custom sorters
# all other levels will be sorted by default order
order = {'size': ['small', 'medium', 'tall']}
def sorter(s):
if s.name in order:
return s.map({k:v for v,k in enumerate(order[s.name])})
return s
out = df_0.sort_index(level=["number", "size", "letter"], key=sorter)
Output:
x y
number size letter
1 small A 0.530753 0.687982
B 0.722848 0.974920
C 0.174058 0.695016
medium A 0.397016 0.550404
B 0.426989 0.843007
C 0.929218 0.497728
tall A 0.159078 0.005675
B 0.917871 0.384265
C 0.685435 0.585242
2 small A 0.423254 0.838356
B 0.342158 0.209632
...
CodePudding user response:
A solution with pandas.CategoricalIndex
:
categories_order = ['small', 'medium', 'tall']
categories = pd.CategoricalIndex(df_0.index.levels[1].values,
categories=categories_order,
ordered=True)
df_0.index = df_0.index.set_levels(categories, level=1)
out = df_0.sort_index()
# Output :
print(out)
x y
number size letter
1 small A 0.994537 0.984409
B 0.153568 0.529762
C 0.025857 0.750508
medium A 0.977558 0.488506
B 0.727121 0.189481
C 0.465571 0.606248
tall A 0.638590 0.530700
B 0.181290 0.963911
C 0.068123 0.911834
2 small A 0.183484 0.459770
B 0.587073 0.935522
C 0.068233 0.135137
medium A 0.315199 0.047945
B 0.568500 0.756994
C 0.581868 0.725463
tall A 0.881304 0.739188
B 0.566962 0.855521
C 0.677635 0.851877
3 small A 0.497159 0.804575
B 0.290587 0.104613
C 0.343545 0.468724
medium A 0.868754 0.526125
B 0.759151 0.439089
C 0.515183 0.266913
tall A 0.486962 0.178156
B 0.902421 0.133138
C 0.613683 0.827321