What's a pythonic way to move a certain column level to the top in a pandas multi column index?
Toy example:
import numpy as np
import pandas as pd
cols = pd.MultiIndex.from_arrays(
[
["a1", "a1", "a1", "a1"],
["b1", "b1", "b2", "b2"],
["x1", "x1", "x1", "x1"],
["c1", "c1", "c1", "c1"],
],
names=(
"unknown_level_name_0",
"unknown_level_name_1",
"known_level_name",
"unknown_level_name_last",
),
)
df = pd.DataFrame(np.random.randint(0, 100, [5, 4]), columns=cols)
print(df)
unknown_level_name_0 a1
unknown_level_name_1 b1 b2
known_level_name x1 x1
unknown_level_name_last c1 c1 c1 c1
0 37 34 97 19
1 54 47 53 46
2 63 94 14 85
3 16 51 27 96
4 32 64 76 26
I am looking for the following result:
known_level_name x1 x1
unknown_level_name_0 a1
unknown_level_name_1 b1 b2
unknown_level_name_last c1 c1 c1 c1
0 37 34 97 19
1 54 47 53 46
2 63 94 14 85
3 16 51 27 96
4 32 64 76 26
EDIT:
There can be a variable number of levels. Most level names are unknown. However, there will always be one familiar level name (here: "known_level_name").
Using reorder_levels
or swaplevel
might become tricky if I don't know the exact position of "known_level_name".
CodePudding user response:
Use reorder_levels
(also available as DataFrame method):
# by index
df.columns = df.columns.reorder_levels(order=[2,0,1,3])
or:
# by name
df.columns = df.columns.reorder_levels(order=['unknown_level_name_1',
'known_level_name',
'unknown_level_name_0',
'unknown_level_name_last'
])
Alternative, multiple swaplevel
:
# move level 2 up # move again up
df.swaplevel(2,1,axis=1).swaplevel(1,0,axis=1)
output:
known_level_name x1
unknown_level_name_0 a1
unknown_level_name_1 b1 b2
unknown_level_name_last c1 c1 c1 c1
0 17 68 61 88
1 6 62 81 7
2 82 16 85 92
3 40 22 48 0
4 35 46 68 60
CodePudding user response:
Here is a generic function to move a column level (by label or index) to the top:
def move_top(df, col, inplace=False):
if col in df.columns.names:
idx = df.columns.names.index(col)
elif isinstance(col, int) and 0 < col < len(df.columns):
idx = col
else:
raise IndexError(f'invalid index "{col}"')
order = list(range(len(df.columns)))
order.pop(idx)
order = [idx] order
if inplace:
df.columns = df.columns.reorder_levels(order=order)
else:
return df.reorder_levels(order, axis=1)
move_top(df, 'known_level_name')
output:
known_level_name x1
unknown_level_name_0 a1
unknown_level_name_1 b1 b2
unknown_level_name_last c1 c1 c1 c1
0 33 30 23 77
1 10 73 80 33
2 7 54 52 9
3 71 99 22 22
4 83 15 86 40
CodePudding user response:
You can just try reorder_levels
out = df.reorder_levels([2,0,1,3], axis=1)
Out[184]:
known_level_name x1
unknown_level_name_0 a1
unknown_level_name_1 b1 b2
unknown_level_name_last c1 c1 c1 c1
0 98 32 72 94
1 22 71 15 2
2 25 41 42 38
3 87 74 41 82
4 87 31 18 8