I have an original dataframe as:
import pandas as pd
df = pd.read_excel("Weights.xlsx", sheet_name='Old')
df:
Name S_Name Height Weight
0 John Wright 5.3 52
1 Seven Taylor 6.4 75
2 Ramsay Sen 7.2 77
I get a new file with updated / new information every time. example:
df1 = pd.read_excel("Weights.xlsx", sheet_name='New')
df1:
Name S_Name Height Weight
0 John Wright 5.1 57
1 Seven Taylor 6.3 76
2 Ramsay Sen 7.0 71
3 Radio Mast 5.3 62
4 Randy Stenson 6.4 65
You can find the file used here
I want to preserve the values of df
while I add the new rows from df1
for new Name
and S_Name
such that the resulting dataframe is as:
df_final
Name S_Name Height Weight
0 John Wright 5.3 52
1 Seven Taylor 6.4 75
2 Ramsay Sen 7.2 77
3 Radio Mast 5.3 62
4 Randy Stenson 6.4 65
I would like to add, that the indexes of both files are not the same every time. There is a sort function in the workflow that changes the index position of these values every time.
CodePudding user response:
If you cannot guarantee that the indices are aligned, you need to use both merge
and update
(or combine_first
of you do not want to modify df1
in place)
You can align the DataFrames with merge
, then update
:
ids = ['Name', 'S_Name']
df1.update(df1[ids].merge(df, how='left'))
print(df1)
Output:
Name S_Name Height Weight
0 John Wright 5.3 52.0
1 Seven Taylor 6.4 75.0
2 Ramsay Sen 7.2 77.0
3 Radio Mast 5.3 62.0
4 Randy Stenson 6.4 65.0
If you rather want a new DataFrame:
ids = ['Name', 'S_Name']
df_final = df.combine_first(df1[ids].merge(df, how='left'))
print(df_final)
NB. In both cases, you need to ensure that you have non-duplicated merging keys