Suppose I have two dataframes with partially-overlapping indices and partially-overlapping columns -- I want to merge them on the index, keeping both sets of columns but the values of the second DF where they overlap, ie,
old_df = pd.DataFrame({'fruit': ['apples', 'apples', 'bananas', 'bananas'],
'entree': ['steak', 'chicken', 'chicken', 'fish'],
'side': ['fries', 'salad', 'salad', 'soup']},
index=[0, 1, 2, 3])
new_df = pd.DataFrame({'entree': ['chicken breast', 'salmon', 'ribeye', 'cheeseburger'],
'side': ['greek salad', 'clam chowder', 'fries', 'fries'],
'desert': ['key lime pie', 'chocolate mousse', 'tiramasu', np.nan]},
index=[2, 3, 4, 5])
merged_df = some_merge_op(old_df, new_df)
merged_df
Desired output:
| | fruit | entree | side | desert |
| - | ------ | -------------- | ------------ | --------------- |
| 0 | apples | steak | fries | nan |
| 1 | apples | chicken | salad | nan |
| 2 | bananas| chicken breast | greek salad | key lime pie |
| 3 | bananas| salmon | clam chowder | chocolate mousse|
| 4 | nan | ribeye | fries | tiramasu |
| 5 | nan | cheeseburger | fries | nan |
CodePudding user response:
You can try combine_first
out = old_df.combine_first(new_df)
print(out)
desert entree fruit side
0 NaN steak apples fries
1 NaN chicken apples salad
2 key lime pie chicken bananas salad
3 chocolate mousse fish bananas soup
4 tiramasu ribeye NaN fries
5 NaN cheeseburger NaN fries