How do I merge or update dataframes?-CodePudding

I have an original dataframe as:

import pandas as pd

df = pd.read_excel("Weights.xlsx", sheet_name='Old')
df: 
     Name  S_Name  Height  Weight
0    John  Wright     5.3      52
1   Seven  Taylor     6.4      75
2  Ramsay     Sen     7.2      77

I get a new file with updated / new information every time. example:

df1 = pd.read_excel("Weights.xlsx", sheet_name='New')
df1:
     Name   S_Name  Height  Weight
0    John   Wright     5.1      57
1   Seven   Taylor     6.3      76
2  Ramsay      Sen     7.0      71
3   Radio     Mast     5.3      62
4   Randy  Stenson     6.4      65

You can find the file used here

I want to preserve the values of df while I add the new rows from df1 for new Name and S_Name such that the resulting dataframe is as:

df_final
     Name   S_Name  Height  Weight
0    John   Wright     5.3      52
1   Seven   Taylor     6.4      75
2  Ramsay      Sen     7.2      77
3   Radio     Mast     5.3      62
4   Randy   Stenson    6.4      65

I would like to add, that the indexes of both files are not the same every time. There is a sort function in the workflow that changes the index position of these values every time.

CodePudding user response：

If you cannot guarantee that the indices are aligned, you need to use both merge and update (or combine_first of you do not want to modify df1 in place)

You can align the DataFrames with merge, then update:

ids = ['Name', 'S_Name']
df1.update(df1[ids].merge(df, how='left'))

print(df1)

Output:

     Name   S_Name  Height  Weight
0    John   Wright     5.3    52.0
1   Seven   Taylor     6.4    75.0
2  Ramsay      Sen     7.2    77.0
3   Radio     Mast     5.3    62.0
4   Randy  Stenson     6.4    65.0

If you rather want a new DataFrame:

ids = ['Name', 'S_Name']
df_final = df.combine_first(df1[ids].merge(df, how='left'))

print(df_final)

NB. In both cases, you need to ensure that you have non-duplicated merging keys