Home > Enterprise >  Merge right dataframe into left dataframe, preferring values from right dataframe and keeping new ro
Merge right dataframe into left dataframe, preferring values from right dataframe and keeping new ro

Time:04-13

How can I write the pandas equivalent of the pure Python

left: dict[str, dict] = ...  # some rows keyed by KEY
right: dict[str, dict] = ...  # more rows keyed by KEY
merge_cols: list[str] = ...  # the columns that should be written into left from right

for key, row in right.items():
    if key not in left:
        left[key] = row
    else:
        for col in merge_cols:
            left[key][col] = row[col]

Such that, given:

merge_cols = ['col']
ldf = pd.DataFrame({'col': [ 3, 4, 5], 'no':['foo', 'foo', 'bar']}, index=[1,2,3])

   col   no
1    3  foo
2    4  foo
3    5  bar
    
rdf = pd.DataFrame({'col': [-2, -4, -7]}, index=[3, 4, 5])

   col
3   -2
4   -4
5   -7

there is a resulting dataframe:

   col   no
1  3.0  foo
2  4.0  foo
3 -2.0  bar
4 -4.0  NaN
5 -7.0  NaN

CodePudding user response:

There are likely other ways to do this, but I found one that seems to work nicely.

First, copy the columns for matching rows into the left dataframe with update:

ldf.update(rdf[shared_cols])  # a mutating operation

Then, find the difference between the indices so that you can append the remaining rows:

new_row_indices = list(set(rdf.index) - set(ldf.index))
ldf = ldf.append(rdf.loc[new_row_indices])

CodePudding user response:

Another option is combine first, where you replace matching index positions in ldf with NaN, before combining:

ldf.loc[ldf.index.intersection(rdf.index), merge_cols] = np.nan

ldf.combine_first(rdf)

   col   no
1  3.0  foo
2  4.0  foo
3 -2.0  bar
4 -4.0  NaN
5 -7.0  NaN

the update option does the same thing though, so this is just an alternative.

  • Related