Home > Enterprise >  How to merge two dataframes with overlapping columns, update existing values and add new values
How to merge two dataframes with overlapping columns, update existing values and add new values

Time:09-23

I would like to merge two dataframes and update values in the first dataframe from the second dataframe.

I need to add all values from data2 to data1 and if the username already exists in data1 I need the amount value to be updated from data2.

import pandas as pd
data1 = pd.DataFrame([['user1',  10], ['user2', 11], ['user3',  12],['user4',10]], columns=['username', 'amount'])
data2 = pd.DataFrame([['user4', 15], ['user5', 60]], columns=['username', 'amount'])
#Desired dataframe:  
data3= pd.DataFrame([['user1',  10], ['user2', 11], ['user3',  12],['user4',15], ['user5', 60]],columns=['username', 'amount'])       
    

I would like to update vaues existing in both dataframes and add new rows if they are not in the first dataframe:

Dataframes:
data1:
username    amount
0   user1   10
1   user2   11
2   user3   12
3   user4   10

data2:
username    amount
0   user4   15
1   user5   60

data3: (Desired dataframe)
username    amount
0   user1   10
1   user2   11
2   user3   12
3   user4   15
4   user5   60

I tried Concat, Merge, Update, Join but couldn't get them to work in the way I want.

Concat:

print('Concat:')
df_concat = pd.concat([data1, data2], axis=0, join='outer', ignore_index='true', verify_integrity='True')
display(df_concat)


username    amount
0   user1   10
1   user2   11
2   user3   12
3   user4   10
4   user4   15
5   user5   60

Merge:

print('Merge:')
df_merge = data1.merge(data2,how='outer', on='username',indicator = True) 
display(df_merge)

username    amount_x    amount_y    _merge
0   user1   10.0    NaN left_only
1   user2   11.0    NaN left_only
2   user3   12.0    NaN left_only
3   user4   10.0    15.0    both
4   user5   NaN 60.0    right_only

Update:

print('Update:')
data1.update(data2)
display(data1)

    username    amount
0   user4   15.0
1   user5   60.0
2   user3   12.0
3   user4   10.0

Join:

print('Join:')
data1.join(data2, how='outer', lsuffix='username', rsuffix='amount')
display(data1)

username    amount
0   user1   10
1   user2   11
2   user3   12
3   user4   10

How do I get the desired dataframe data3 from data1 and data2?

CodePudding user response:

You could use concat drop_duplicates:

data3 = pd.concat([data1, data2]).drop_duplicates(subset=["username"], keep="last")

print(data3)

Output

  username  amount
0    user1      10
1    user2      11
2    user3      12
0    user4      15
1    user5      60
  • Related