Home > OS >  How do I merge or update dataframes?
How do I merge or update dataframes?

Time:05-29

I have an original dataframe as:

import pandas as pd

df = pd.read_excel("Weights.xlsx", sheet_name='Old')
df: 
     Name  S_Name  Height  Weight
0    John  Wright     5.3      52
1   Seven  Taylor     6.4      75
2  Ramsay     Sen     7.2      77

I get a new file with updated / new information every time. example:

df1 = pd.read_excel("Weights.xlsx", sheet_name='New')
df1:
     Name   S_Name  Height  Weight
0    John   Wright     5.1      57
1   Seven   Taylor     6.3      76
2  Ramsay      Sen     7.0      71
3   Radio     Mast     5.3      62
4   Randy  Stenson     6.4      65

You can find the file used here

I want to preserve the values of df while I add the new rows from df1 for new Name and S_Name such that the resulting dataframe is as:

df_final
     Name   S_Name  Height  Weight
0    John   Wright     5.3      52
1   Seven   Taylor     6.4      75
2  Ramsay      Sen     7.2      77
3   Radio     Mast     5.3      62
4   Randy   Stenson    6.4      65

I would like to add, that the indexes of both files are not the same every time. There is a sort function in the workflow that changes the index position of these values every time.

CodePudding user response:

If you cannot guarantee that the indices are aligned, you need to use both merge and update (or combine_first of you do not want to modify df1 in place)

You can align the DataFrames with merge, then update:

ids = ['Name', 'S_Name']
df1.update(df1[ids].merge(df, how='left'))

print(df1)

Output:

     Name   S_Name  Height  Weight
0    John   Wright     5.3    52.0
1   Seven   Taylor     6.4    75.0
2  Ramsay      Sen     7.2    77.0
3   Radio     Mast     5.3    62.0
4   Randy  Stenson     6.4    65.0

If you rather want a new DataFrame:

ids = ['Name', 'S_Name']
df_final = df.combine_first(df1[ids].merge(df, how='left'))

print(df_final)

NB. In both cases, you need to ensure that you have non-duplicated merging keys

  • Related