I need to merge two dataframes replacing some rows based on string indexes and adding the new rows (new indexes). I'm curious about some kind of standard or efficient way to do it.
# Example: "A" and "B" are columns and "idx" is the index
# Input
# df1
idx A B
a 1 1
b 2 2
c 3 3
# df2
idx A B
a -1 -1
b -2 -2
d -10 -10
f -20 -20
# process
merged_df = replace_and_add(old_df=df_1, new_df=df_2)
# Output
# new_df
idx A B
a -1 -1
b -2 -2
c 3 3
d -10 -10
f -20 -20
CodePudding user response:
You're looking for a concatenation of dataframes, without replication (union distinct). In pandas, you could:
df = pd.concat( [old_df, new_df] ).drop_duplicates()
CodePudding user response:
You can reindex
your first dataframe with the second one then update
values. It's an efficient way to avoid to create a temporary dataframe:
df1 = df1.reindex(df1.index.union(df2.index))
df1.update(df2)
print(df1)
# Output:
A B
idx
a -1.0 -1.0
b -2.0 -2.0
c 3.0 3.0
d -10.0 -10.0
f -20.0 -20.0
The downside is the dtype which is now float64 instead of int64.