I have a MultiIndex column Pandas DataFrame A:
foo
bar baz
s1_a 1 2
s1_b 3 4
s2_a 5 6
s2_b 7 8
I want to group the data on a key from another DataFrame B:
key
s1_a 1
s1_b 1
s2_a 2
s2_b 2
For a DataFrame without a MultiIndex, I would do:
pd.merge(A, B, left_index=True, right_index=True).groupby('key').sum()
but this does not work with the MultiIndex. The desired result is
foo
bar baz
1 4 6
2 12 14
How can I achieve this?
CodePudding user response:
You can subset the level and restore is after the merge
using pandas.concat
:
C = pd.concat({'foo': (pd.merge(A['foo'], B, left_index=True, right_index=True)
.groupby('key')
.sum()
)
}, axis=1)
output:
>>> C
foo
bar baz
key
1 4 6
2 12 14
NB. technically, the code you used should work with a FutureWarning
, however you lose the MultiIndex an get tuples instead
>>> pd.merge(A, B, left_index=True, right_index=True).groupby('key').sum()
(foo, bar) (foo, baz)
key
1 4 6
2 12 14
FutureWarning: merging between different levels is deprecated and will be removed in a future version. (2 levels on the left,1 on the right)
CodePudding user response:
Series can be passed to groupby
directly and the grouper will do appropriate index alignment, so it is possible to do:
A.groupby(B['key']).sum()
foo
bar baz
key
1 4 6
2 12 14
Setup:
import numpy as np
import pandas as pd
idx = ['s1_a', 's1_b', 's2_a', 's2_b']
A = pd.DataFrame(
np.arange(1, 9).reshape((-1, 2)),
index=idx,
columns=pd.MultiIndex.from_product([['foo'], ['bar', 'baz']])
)
B = pd.DataFrame({'key': [1, 1, 2, 2]}, index=idx)
Note this will work in many cases but is less durable than merging:
B
variant 1:
B = pd.DataFrame({'key': [1, 2, 2]}, index=['s1_a', 's1_b', 's2_b'])
key
s1_a 1 # No s2_a
s1_b 2
s2_b 2
A.groupby(B['key']).sum()
foo
bar baz
key
1.0 1 2
2.0 10 12
B
variant 2:
B = pd.DataFrame({'key': [1, 1, 2, 2]}, index=['s1_a', 's2_a', 's1_b', 's2_b'])
key
s1_a 1
s2_a 1 # s1_a/s2_a together
s1_b 2
s2_b 2 # s1_b/s2_b together
A.groupby(B['key']).sum()
foo
bar baz
key
1 6 8
2 10 12