How can I write the pandas equivalent of the pure Python
left: dict[str, dict] = ... # some rows keyed by KEY
right: dict[str, dict] = ... # more rows keyed by KEY
merge_cols: list[str] = ... # the columns that should be written into left from right
for key, row in right.items():
if key not in left:
left[key] = row
else:
for col in merge_cols:
left[key][col] = row[col]
Such that, given:
merge_cols = ['col']
ldf = pd.DataFrame({'col': [ 3, 4, 5], 'no':['foo', 'foo', 'bar']}, index=[1,2,3])
col no
1 3 foo
2 4 foo
3 5 bar
rdf = pd.DataFrame({'col': [-2, -4, -7]}, index=[3, 4, 5])
col
3 -2
4 -4
5 -7
there is a resulting dataframe:
col no
1 3.0 foo
2 4.0 foo
3 -2.0 bar
4 -4.0 NaN
5 -7.0 NaN
CodePudding user response:
There are likely other ways to do this, but I found one that seems to work nicely.
First, copy the columns for matching rows into the left dataframe with update:
ldf.update(rdf[shared_cols]) # a mutating operation
Then, find the difference between the indices so that you can append the remaining rows:
new_row_indices = list(set(rdf.index) - set(ldf.index))
ldf = ldf.append(rdf.loc[new_row_indices])
CodePudding user response:
Another option is combine first, where you replace matching index positions in ldf with NaN, before combining:
ldf.loc[ldf.index.intersection(rdf.index), merge_cols] = np.nan
ldf.combine_first(rdf)
col no
1 3.0 foo
2 4.0 foo
3 -2.0 bar
4 -4.0 NaN
5 -7.0 NaN
the update
option does the same thing though, so this is just an alternative.