I would like to merge two dataframes and update values in the first dataframe from the second dataframe.
I need to add all values from data2
to data1
and if the username
already exists in data1
I need the amount
value to be updated from data2
.
import pandas as pd
data1 = pd.DataFrame([['user1', 10], ['user2', 11], ['user3', 12],['user4',10]], columns=['username', 'amount'])
data2 = pd.DataFrame([['user4', 15], ['user5', 60]], columns=['username', 'amount'])
#Desired dataframe:
data3= pd.DataFrame([['user1', 10], ['user2', 11], ['user3', 12],['user4',15], ['user5', 60]],columns=['username', 'amount'])
I would like to update vaues existing in both dataframes and add new rows if they are not in the first dataframe:
Dataframes:
data1:
username amount
0 user1 10
1 user2 11
2 user3 12
3 user4 10
data2:
username amount
0 user4 15
1 user5 60
data3: (Desired dataframe)
username amount
0 user1 10
1 user2 11
2 user3 12
3 user4 15
4 user5 60
I tried Concat, Merge, Update, Join but couldn't get them to work in the way I want.
Concat:
print('Concat:')
df_concat = pd.concat([data1, data2], axis=0, join='outer', ignore_index='true', verify_integrity='True')
display(df_concat)
username amount
0 user1 10
1 user2 11
2 user3 12
3 user4 10
4 user4 15
5 user5 60
Merge:
print('Merge:')
df_merge = data1.merge(data2,how='outer', on='username',indicator = True)
display(df_merge)
username amount_x amount_y _merge
0 user1 10.0 NaN left_only
1 user2 11.0 NaN left_only
2 user3 12.0 NaN left_only
3 user4 10.0 15.0 both
4 user5 NaN 60.0 right_only
Update:
print('Update:')
data1.update(data2)
display(data1)
username amount
0 user4 15.0
1 user5 60.0
2 user3 12.0
3 user4 10.0
Join:
print('Join:')
data1.join(data2, how='outer', lsuffix='username', rsuffix='amount')
display(data1)
username amount
0 user1 10
1 user2 11
2 user3 12
3 user4 10
How do I get the desired dataframe data3 from data1 and data2?
CodePudding user response:
You could use concat drop_duplicates:
data3 = pd.concat([data1, data2]).drop_duplicates(subset=["username"], keep="last")
print(data3)
Output
username amount
0 user1 10
1 user2 11
2 user3 12
0 user4 15
1 user5 60