Consider the following DataFrame:
import numpy as np
import pandas as pd
arrays1 = [
["A", "A", "A", "B", "B", "B"],
["foo", "bar", "baz", "foo", "bar", "baz"],
]
tuples1 = list(zip(*arrays1))
index_values1 = pd.MultiIndex.from_tuples(tuples1)
df1 = pd.DataFrame(np.ones((6, 6)), index=index_values1, columns=index_values1)
print(df1)
A B
foo bar baz foo bar baz
A foo 1.0 1.0 1.0 1.0 1.0 1.0
bar 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0
B foo 1.0 1.0 1.0 1.0 1.0 1.0
bar 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0
Say I want to replace the deepest level of either indices, columns, or both; as well as add a level, according to the following mapping:
d_idx = {
'foo': ('qux', 'one'),
'bar': ('quux', 'two'),
'baz': ('corge', 'three'),
}
Such that I get either:
A B
foo bar baz foo bar baz
A qux one 1.0 1.0 1.0 1.0 1.0 1.0
quux two 1.0 1.0 1.0 1.0 1.0 1.0
corge three 1.0 1.0 1.0 1.0 1.0 1.0
B qux one 1.0 1.0 1.0 1.0 1.0 1.0
quux two 1.0 1.0 1.0 1.0 1.0 1.0
corge three 1.0 1.0 1.0 1.0 1.0 1.0
or
A B
qux quux corge qux quux corge
one two three one two three
A foo 1.0 1.0 1.0 1.0 1.0 1.0
bar 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0
B foo 1.0 1.0 1.0 1.0 1.0 1.0
bar 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0
or
A B
qux quux corge qux quux corge
one two three one two three
A qux one 1.0 1.0 1.0 1.0 1.0 1.0
quux two 1.0 1.0 1.0 1.0 1.0 1.0
corge three 1.0 1.0 1.0 1.0 1.0 1.0
B qux one 1.0 1.0 1.0 1.0 1.0 1.0
quux two 1.0 1.0 1.0 1.0 1.0 1.0
corge three 1.0 1.0 1.0 1.0 1.0 1.0
I have tried a number of ways, which seem to work but don't seem elegant to me. In particular, I want to make sure that this mapping is robust (e.g. no reordering due to unstacking indices etc.).
My first approach was by constructing a correspondance matrix:
qux quux corge
one two three
foo 1.0 0.0 0.0
bar 0.0 1.0 0.0
baz 0.0 0.0 1.0
The advantage of this approach is that the mapping from one set of indices to the other is robust, by taking the dot product with this matrix and an unstacked df1
. However by unstacking, the indices are implicitly sorted. This means I need to somehow retain the original order of the index before being able to take the dot product. While probably doable, this doesn't seem elegant to me.
df1_u = df1.unstack(level=0)
A A A A A A B B B B B B
foo foo bar bar baz baz foo foo bar bar baz baz
A B A B A B A B A B A B
bar 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
foo 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Next, I tried changing the indices in a for-loop:
l_idx_old = list(df1.index)
l_idx_new = []
for t_idx_old in l_idx_old:
idx0, idx1_old = t_idx_old
idx1_new, idx2_new = d_idx[idx1_old]
print(idx1_old, idx1_new, idx2_new)
t_idx_new = (idx0, idx1_new, idx2_new)
l_idx_new.append(t_idx_new)
df1.index = pd.MultiIndex.from_tuples(l_idx_new)
This works, however the last line is not robust, as there is no check whether the indices are assigned correctly.
In short, is there a robust and elegant way to carry out this intended mutation? Any help is much appreciated.
CodePudding user response:
If mapping one level of MultiIndex
get new MultiIndex
, so possible append to existing by DataFrame.set_index
with remove mapped level by DataFrame.droplevel
:
d_idx = {
'foo': ('qux', 'one'),
'bar': ('quux', 'two'),
'baz': ('corge', 'three'),
}
df1 = df1.set_index(df1.index.get_level_values(1).map(d_idx), append=True).droplevel(1)
print (df1)
A B
foo bar baz foo bar baz
A qux one 1.0 1.0 1.0 1.0 1.0 1.0
quux two 1.0 1.0 1.0 1.0 1.0 1.0
corge three 1.0 1.0 1.0 1.0 1.0 1.0
B qux one 1.0 1.0 1.0 1.0 1.0 1.0
quux two 1.0 1.0 1.0 1.0 1.0 1.0
corge three 1.0 1.0 1.0 1.0 1.0 1.0
For columns is it possible with helper DataFrame
and same principe:
new = df1.columns.get_level_values(1).map(d_idx)
df1.columns = df1.columns.to_frame().set_index(new, append=True).droplevel(1).index
print (df1)
A B
qux quux corge qux quux corge
one two three one two three
A foo 1.0 1.0 1.0 1.0 1.0 1.0
bar 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0
B foo 1.0 1.0 1.0 1.0 1.0 1.0
bar 1.0 1.0 1.0 1.0 1.0 1.0
baz 1.0 1.0 1.0 1.0 1.0 1.0