I wanted to know if there is a way for me to merge / re-join the missing rows simply by index.
My original way to approach is just to cleanly separate df1 into df1_cleaned and df1_untouched, and then join them back together. But I thought there's probably an easier way to re-join the two df2 since I didn't change the index. I tried outer merge with left_index and right_index but was left with the dupe columns with suffix to clean.
df1
index | colA | colB | colC |
---|---|---|---|
0 | California | 123 | abc |
1 | New York | 456 | def |
2 | Texas | 789 | ghi |
df2 (subset of df1 and cleaned)
index | colA | colB | colC |
---|---|---|---|
0 | California | 321 | abc |
2 | Texas | 789 | ihg |
end-result
index | colA | colB | colC |
---|---|---|---|
0 | California | 321 | abc |
1 | New York | 456 | def |
2 | Texas | 789 | ihg |
CodePudding user response:
You can use combine_first
or update
:
df_out = df2.combine_first(df1)
or, pd.DataFrame.update
(which is an inplace operation and will overwrite df1):
df1.update(df2)
Output:
colA colB colC
index
0 California 321.0 abc
1 New York 456.0 def
2 Texas 789.0 ihg
CodePudding user response:
You can get difference of index, and add the missing index from df1 to df_result after reindexing df2
df_result = df2.reindex(df1.index)
missing_index = df1.index.difference(df2.index)
df_result.loc[missing_index] = df1.loc[missing_index]
print(df_result)
colA colB colC
0 California 321.0 abc
1 New York 456.0 def
2 Texas 789.0 ihg