Lookup a dataframe and replace certain values in a column-CodePudding

There are 2 dataframes, the 1st dataframe contains list of emp_id and old email_id. While the 2nd dataframe contains emp_id and new email_id.

Task is to, lookup 2nd dataframe and replace certain values in the 1st dataframe.

Input Dataframe:

import pandas as pd
data = { 
'emp_id': [111, 222, 333, 444, 555],
'email_id': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']
}
data = pd.DataFrame(data)

Current Output:

     emp_id   email_id
0     111    [email protected]
1     222    [email protected]
2     333    [email protected]
3     444    four@gmailcom
4     555    [email protected]

#replace certain values with new values by looking up another dataframe
data1 = { 
'emp_id': [111, 333, 555],
'email_id': ['[email protected]','[email protected]','[email protected]']
}
data1 = pd.DataFrame(data1)

Desired Output:

     emp_id   email_id
0     111    [email protected]
1     222    [email protected]
2     333    [email protected]
3     444    four@gmailcom
4     555    [email protected]

Original data contains 50k rows so merging them doesn't seem like the correct option. Any help would be appreciated.

Cheers !

CodePudding user response：

Let us try update

out = data.set_index('emp_id')
out.update(data1.set_index('emp_id'))
out.reset_index(inplace=True)
out
   emp_id         email_id
0     111    [email protected]
1     222    [email protected]
2     333  [email protected]
3     444   [email protected]
4     555   [email protected]

CodePudding user response：

The solution I will provide makes use of Panda's Boolean Indexing .

# Compare the column of interest of the DataFrames. Returns a Pandas Series with True/False based on the inequality operation.
difference = (data1['email_id'] != data2['email_id'])

# Get the indexes of the different rows, i.e the rows that hold True for the inequality.
indexes_to_be_changed = data1[difference].index

# Replace the rows of the first DataFrame with the rows of the second DataFrame.
data1.iloc[indexes_to_be_changed] = data2.iloc[indexes_to_be_changed]