Context
Say I have a multi-indexed dataframe as follows:
import numpy as np
import pandas as pd
arrays = [
["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"],
]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
data = np.array([
[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10],
[11, 12],
[13, 14],
[15, 16],
])
df = pd.DataFrame(data, index=index, columns=('a', 'b'))
which looks something like this:
a b
first second
bar one 1 2
two 3 4
baz one 5 6
two 7 8
foo one 9 10
two 11 12
qux one 13 14
two 15 16
I would like to copy the values of column a
for the first index level bar
into the same column for the first index level qux
, aligned on the second level of the index (here called second
). In other words, I would like to obtain the following dataframe from the one above:
a b
first second
bar one 1 2
two 3 4
baz one 5 6
two 7 8
foo one 9 10
two 11 12
qux one 1 14 # <-- column a changed to match first = bar for second = one
two 3 16 # <-- column a changed to match first = bar for second = two
I understand based on the answer given to this question I can accomplish this by using pd.IndexSlice
in conjunction with .loc
and .values
as follows:
df.loc[pd.IndexSlice['qux', :], 'a'] = df.loc[pd.IndexSlice['bar', :], 'a'].values
I don't intuitively like this (perhaps/probably unjustifiably) as it's not immediately clear to me if the values will always be aligned on the second index level or not:
Question:
Can I guarantee that the above assignment (accessing using .values
) will always be aligned on the second level of the multi-index?
If not, is there a way of accomplishing what I'm trying to achieve?
CodePudding user response:
No, it will not be aligned, because by using .value
(which, by the way, is deprecated in favor of .to_numpy()
), which returns the underlying numpy array, you remove all index/column information, so alignment is not possible.
Here's one solution to preserve the alignment:
df.loc['qux', 'a'] = df.loc['qux', 'a'].index.map(df.loc['bar', 'a'].to_dict())
Output:
>>> df
a b
first second
bar two 1.0 2
one 3.0 4
baz one 5.0 6
two 7.0 8
foo one 9.0 10
two 11.0 12
qux one 3.0 14
two 1.0 16