TL;DR: Merge two dataframes based on their multi-indices and a column that they share.
The two multi-index dataframes (call them dfA
and dfB
) do not have unique indices and are of different shapes. However level 0 of the index specifies group, level 1 specifies material for both. In dataframe dfA
and dfB
, there is a column called SR
.
The correct identification would therefore involve both indices and the value of the SR
column.
import pandas as pd
import numpy as np
tupA = [('G1','M1')]*3 [('G1','M2')] [('G2','M3')]*2
indA = pd.MultiIndex.from_tuples(tupA, names = ['Group', 'Material'])
dfA = pd.DataFrame({'SR': [3,5,10, 3,5,15],
'ValA': [1,2,1,4,5,6]},
index = ind
)
tupB = [('G1','M1')]*2 [('G1','M2')] [('G2','M3')]
indB = pd.MultiIndex.from_tuples(tupB, names = ['Group', 'Material'])
dfB = pd.DataFrame({'SR': [3,5,3,5],
'ValB': [2,4,5,6]},
index = indB
)
print(dfA,'\n', dfB)
Yields:
SR ValA
Group Material
G1 M1 3 1
M1 5 2
M1 10 1
M2 3 4
G2 M3 5 5
M3 15 6
SR ValB
Group Material
G1 M1 3 2
M1 5 4
M2 3 5
G2 M3 5 6
The Task:
To merge the two dataframes based on the multi-index and the column SR
. The SR
entries in dfB
which are not in dfA
should be replaced by np.nan
.
Desired Output:
The merged dataframe should be something like this:
SR ValA ValB
Group Material
G1 M1 3 1 2.0
M1 5 2 4.0
M1 10 1 NaN
M2 3 4 5.0
G2 M3 5 5 6.0
M3 15 6 NaN
It has all the rows of dfA
but for those SR
values which are not in dfB
it has NaN
.
Attempt at Solution:
I tried a number of left and outer joins but I can't get the NaN
. The documentation does have an example which gives NaN
but it's not using multi-indices.
Would appreciate some help.
CodePudding user response:
You can use the merge function with your three keys passed to the on argument:
dfA.merge(dfB,on=["Group","Material","SR"], how="left")
Output :
SR ValA ValB
Group Material
G1 M1 3 1 2.0
M1 5 2 4.0
M1 10 1 NaN
M2 3 4 5.0
G2 M3 5 5 6.0
M3 15 6 NaN