Join dataframes on a mix of multiindex columns and data columns-CodePudding

segments = pd.DataFrame({
    'shipment_id': [1, 1, 1, 2, 2],
    'segment_order': [1, 2, 3, 1, 2],
    'container_id': [None, 'aa', 'bb', 'cc', None]
}).set_index(['shipment_id', 'segment_order'])
print(segments)

containers = pd.DataFrame({
    'shipment_id': [1, 1, 2],
    'container_id': ['aa', 'bb', 'cc'],
    'temperature_max': [24.6, 25.3, 31.8],
}).set_index(['shipment_id', 'container_id'])
print(containers)

                          container_id
shipment_id segment_order             
1           1                     None
            2                       aa
            3                       bb
2           1                       cc
            2                     None

                          temperature_max
shipment_id container_id                 
1           aa                       24.6
            bb                       25.3
2           cc                       31.8

I would like to join the containers['temperature_max'] column to the segments data frame. Note that the join condition involves a mix of regular data columns and multi-index columns.

What is the preferred way to perform this join?

The expected output is:

                          container_id temperature_max
shipment_id segment_order             
1           1                     None             NaN
            2                       aa            24.6
            3                       bb            25.3
2           1                       cc            31.8
            2                     None             NaN

CodePudding user response：

While merge can work with index name, it will happily erase old index and create new ones (makes sense). For your data, you can temporary unset segment_order index level, merge, then set segment_order back as index:

(segments.reset_index(level='segment_order')
    .merge(containers, on=['shipment_id', 'container_id'], how='left')
    .set_index('segment_order', append=True)
)

Output:

                          container_id  temperature_max
shipment_id segment_order                              
1           1                     None              NaN
            2                       aa             24.6
            3                       bb             25.3
2           1                       cc             31.8
            2                     None              NaN