I have two DataFrames with user details and scores. Some users have a second score and will be present in the second DataFrame. What I want to do is to join or merge them together (don't really mind which) to get their final score i.e. if they exist in the second DataFrame then take it from there, otherwise from the first.
DataFrame one:
first_name last_name email score feedback
0 Bill First [email protected] 57.0 Needs to try again
1 Anna Second [email protected] 10.0 Not enough to proceed
2 Fred Third [email protected] 100.0 Excellent!
DataFrame two:
first_name last_name email score feedback
0 Bill First [email protected] 100.0 Much better!
Desired result:
first_name last_name email score feedback
0 Bill First [email protected] 100.0 Much better!
1 Anna Second [email protected] 10.0 Not enough to proceed
2 Fred Third [email protected] 100.0 Excellent!
The closest I've got is df1.merge(df2, on=['email', 'first_name', 'last_name', ], how='left')
but that gives me columns: first_name last_name email score_x feedback_x score_y feedback_y
CodePudding user response:
Use pd.concat
and drop_duplicates
:
out = pd.concat([df2, df1]).drop_duplicates(['first_name', 'last_name', 'email'])
print(out)
# Output
first_name last_name email score feedback
0 Bill First [email protected] 100.0 Much better!
1 Anna Second [email protected] 10.0 Not enough to proceed
2 Fred Third [email protected] 100.0 Excellent!
Assuming first_name
, last_name
and email
fields are the primary key to identify records like your merge
.
Just before drop_duplicates
, your dataframe looks like:
>>> pd.concat([df2, df1])
first_name last_name email score feedback
0 Bill First [email protected] 100.0 Much better!
0 Bill First [email protected] 57.0 Needs to try again # dupe
1 Anna Second [email protected] 10.0 Not enough to proceed
2 Fred Third [email protected] 100.0 Excellent!
CodePudding user response:
Alternative using combine_first
:
cols = ['email', 'first_name', 'last_name']
out = df2.set_index(cols).combine_first(df1.set_index(cols)).reset_index()
output:
email first_name last_name score feedback
0 [email protected] Bill First 100.0 Much better!
1 [email protected] Anna Second 10.0 Not enough to proceed
2 [email protected] Fred Third 100.0 Excellent!