Pandas: How to match / filter same key / id values (duplicates) from 2 different dataframes and repl-CodePudding

I have 2 dataframes of different sizes. The first dataframe(df1) has 4 columns, but two of those columns have the same name as the columns in the second dataframe(df2), which is only comprised of 2 columns. The columns in common are ['ID'] and ['Department'].

I want to check if any ID from df2 are in df1. If so, I want to replace df1['Department'] value with df2['Department'] value.

For instance, df1 looks something like this:

ID      Department     Yrs Experience      Education
1234    Science        1                   Bachelors
2356    Art            3                   Bachelors
2456    Math           2                   Masters
4657    Science        4                   Masters

And df2 looks something like this:

ID      Department    
1098    P.E.
1234    Technology       
2356    History

I want to check if the ID from df2 is in df1 and if so, update Department. The output should looks something like this:

ID      Department     Yrs Experience      Education
1234    **Technology** 1                   Bachelors
2356    **History**    3                   Bachelors
2456    Math           2                   Masters
4657    Science        4                   Masters

The expected updates to df1 are in bold

Is there an efficient way to do this?

Thank you for taking the time to read this and help.

CodePudding user response：

You can use ID of df1 to map with the Pandas series formed by setting ID on df2 as index and taking the column of Department from df2 (this acts as a mapping table).

Then, in case of no match of ID from df2, we fill-in the original values of Department from df1 (to retain original values in case of no match):

df1['Department'] = (df1['ID'].map(df2.set_index('ID')['Department'])
                              .fillna(df1['Department'])
                    )

Result:

print(df1)

     ID  Department  Yrs Experience  Education
0  1234  Technology               1  Bachelors
1  2356     History               3  Bachelors
2  2456        Math               2    Masters
3  4657     Science               4    Masters

CodePudding user response：

Try:

df1["Department"].update(
    df1[["ID"]].merge(df2, on="ID", how="left")["Department"]
)
print(df1)

Prints:

     ID  Department  Yrs Experience  Education
0  1234  Technology               1  Bachelors
1  2356     History               3  Bachelors
2  2456        Math               2    Masters
3  4657     Science               4    Masters

CodePudding user response：

df_1 = pd.DataFrame(data={'ID':[1234, 2356, 2456, 4657], 'Department':['Science', 'Art', 'Math', 'Science']})
df_2 = pd.DataFrame(data={'ID':[1234, 2356], 'Department':['Technology', 'History']})

df_1.loc[df_1['ID'].isin(df_2['ID']), 'Department'] = df_2['Department']

OutPut

     ID  Department
0  1234  Technology
1  2356     History
2  2456        Math
3  4657     Science