Merge right dataframe into left dataframe, preferring values from right dataframe and keeping new ro-CodePudding

How can I write the pandas equivalent of the pure Python

left: dict[str, dict] = ...  # some rows keyed by KEY
right: dict[str, dict] = ...  # more rows keyed by KEY
merge_cols: list[str] = ...  # the columns that should be written into left from right

for key, row in right.items():
    if key not in left:
        left[key] = row
    else:
        for col in merge_cols:
            left[key][col] = row[col]

Such that, given:

merge_cols = ['col']
ldf = pd.DataFrame({'col': [ 3, 4, 5], 'no':['foo', 'foo', 'bar']}, index=[1,2,3])

   col   no
1    3  foo
2    4  foo
3    5  bar
    
rdf = pd.DataFrame({'col': [-2, -4, -7]}, index=[3, 4, 5])

   col
3   -2
4   -4
5   -7

there is a resulting dataframe:

   col   no
1  3.0  foo
2  4.0  foo
3 -2.0  bar
4 -4.0  NaN
5 -7.0  NaN

CodePudding user response：

There are likely other ways to do this, but I found one that seems to work nicely.

First, copy the columns for matching rows into the left dataframe with update:

ldf.update(rdf[shared_cols])  # a mutating operation

Then, find the difference between the indices so that you can append the remaining rows:

new_row_indices = list(set(rdf.index) - set(ldf.index))
ldf = ldf.append(rdf.loc[new_row_indices])

CodePudding user response：

Another option is combine first, where you replace matching index positions in ldf with NaN, before combining:

ldf.loc[ldf.index.intersection(rdf.index), merge_cols] = np.nan

ldf.combine_first(rdf)

   col   no
1  3.0  foo
2  4.0  foo
3 -2.0  bar
4 -4.0  NaN
5 -7.0  NaN

the update option does the same thing though, so this is just an alternative.