segments = pd.DataFrame({
'shipment_id': [1, 1, 1, 2, 2],
'segment_order': [1, 2, 3, 1, 2],
'container_id': [None, 'aa', 'bb', 'cc', None]
}).set_index(['shipment_id', 'segment_order'])
print(segments)
containers = pd.DataFrame({
'shipment_id': [1, 1, 2],
'container_id': ['aa', 'bb', 'cc'],
'temperature_max': [24.6, 25.3, 31.8],
}).set_index(['shipment_id', 'container_id'])
print(containers)
container_id
shipment_id segment_order
1 1 None
2 aa
3 bb
2 1 cc
2 None
temperature_max
shipment_id container_id
1 aa 24.6
bb 25.3
2 cc 31.8
I would like to join the containers['temperature_max']
column to the segments
data frame. Note that the join condition involves a mix of regular data columns and multi-index columns.
What is the preferred way to perform this join?
The expected output is:
container_id temperature_max
shipment_id segment_order
1 1 None NaN
2 aa 24.6
3 bb 25.3
2 1 cc 31.8
2 None NaN
CodePudding user response:
While merge
can work with index name, it will happily erase old index and create new ones (makes sense). For your data, you can temporary unset segment_order
index level, merge
, then set segment_order
back as index:
(segments.reset_index(level='segment_order')
.merge(containers, on=['shipment_id', 'container_id'], how='left')
.set_index('segment_order', append=True)
)
Output:
container_id temperature_max
shipment_id segment_order
1 1 None NaN
2 aa 24.6
3 bb 25.3
2 1 cc 31.8
2 None NaN